android_kernel_xiaomi_sm8350

Author	SHA1	Message	Date
Ingo Molnar	93093d099e	x86: provide readq()/writeq() on 32-bit too, complete if HAVE_READQ/HAVE_WRITEQ are defined, the full range of readq/writeq APIs has to be provided to drivers: drivers/infiniband/hw/amso1100/c2.c: In function 'c2_tx_ring_alloc': drivers/infiniband/hw/amso1100/c2.c:133: error: implicit declaration of function '__raw_writeq' So provide them on 32-bit as well. Also, map all the APIs to the strongest ordering variant. It's way too easy to mess such details up in drivers and the difference between "memory" and "" constrained asm() constructs is in the noise range. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-30 10:26:18 +01:00
Ingo Molnar	a0b1131e47	x86: provide readq()/writeq() on 32-bit too, cleanup Impact: cleanup Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-30 09:38:19 +01:00
Hitoshi Mitake	2c5643b1c5	x86: provide readq()/writeq() on 32-bit too Impact: add new API for drivers Add implementation of readq/writeq to x86_32, and add config value to the x86 architecture to determine existence of readq/writeq. Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-30 09:38:16 +01:00
Jiri Slaby	4385cecf1f	x86: intel_cacheinfo, minor show_type cleanup Impact: cleanup Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-30 07:46:31 +01:00
Thomas Bogendoerfer	7b1dedca42	x86: fix dma_mapping_error for 32bit x86 Devices like b44 ethernet can't dma from addresses above 1GB. The driver handles this cases by falling back to GFP_DMA allocation. But for detecting the problem it needs to get an indication from dma_mapping_error. The bug is triggered by using a VMSPLIT option of 2G/2G. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-29 21:00:38 +01:00
Pavel Machek	8caac56305	aperture_64.c: clarify that too small aperture is valid reason for this code Impact: update comment Clarify that too small aperture is valid reason for this code. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-28 15:24:39 +01:00
Ingo Molnar	5b3eec0c80	x86: ret_from_fork - get rid of jump back Impact: remove dead code If we take a closer look at the rff_trace/rff_action ret_from_fork code, we have to realize that it does all the wrong things: for example it checks the TIF flag - while later on jumping back to the ret-from-syscall path - duplicating the check needlessly. But checking for _TIF_SYSCALL_TRACE is completely unnecessary here because we clear that flag for every freshly forked task. So the whole "tracing" code here, for which there is a out of line jump optimization that makes it even harder to read, is in reality completely dead code ... Reported-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>	2008-11-28 15:01:46 +01:00
Ingo Molnar	3bdae4f464	Merge branch 'x86/debug' into x86/irq We merge this branch because x86/debug touches code that we started cleaning up in x86/irq. The two branches started out independent, but as unexpected amount of activity went into x86/irq, they became dependent. Resolve that by this cross-merge.	2008-11-28 15:00:48 +01:00
Cyrill Gorcunov	9f1e87ea3e	x86: entry_64.S - trivial: space, comments fixup Impact: cleanup Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-28 14:53:48 +01:00
Cyrill Gorcunov	5ae3a139cf	x86: uv bau interrupt -- use proper interrupt number Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Cliff Wickman <cpw@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-28 14:17:25 +01:00
Joerg Roedel	1d9b16d169	x86: move GART specific stuff from iommu.h to gart.h Impact: cleanup Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-28 13:06:27 +01:00
Cyrill Gorcunov	c2c631e318	x86: entry_64.S - use ENTRY to define child_rip child_rip is called not by its name but indirectly rather so make it global and aligned. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-27 13:04:07 +01:00
gorcunov@gmail.com	33454539f3	x86: entry_64.S - use X86_EFLAGS_IF instead of hardcoded number Impact: cleanup Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-27 13:00:32 +01:00
Joerg Roedel	b627c8b17c	x86: always define DECLARE_PCI_UNMAP* macros Impact: fix boot crash on AMD IOMMU if CONFIG_GART_IOMMU is off Currently these macros evaluate to a no-op except the kernel is compiled with GART or Calgary support. But we also need these macros when we have SWIOTLB, VT-d or AMD IOMMU in the kernel. Since we always compile at least with SWIOTLB we can define these macros always. This patch is also for stable backport for the same reason the SWIOTLB default selection patch is. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-27 12:44:08 +01:00
Alexander van Heukelum	d211af055d	i386: get rid of the use of KPROBE_ENTRY / KPROBE_END entry_32.S is now the only user of KPROBE_ENTRY / KPROBE_END, treewide. This patch reorders entry_64.S and explicitly generates a separate section for functions that need the protection. The generated code before and after the patch is equal. The KPROBE_ENTRY and KPROBE_END macro's are removed too. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-27 12:37:54 +01:00
Alexander van Heukelum	ddeb8f2149	x86_64: get rid of the use of KPROBE_ENTRY / KPROBE_END Impact: clean up assembly macros and annotations - with some object impact entry_64.S is the only user of KPROBE_ENTRY / KPROBE_END on x86_64. This patch reorders entry_64.S and explicitly generates a separate section for functions that need the protection. The generated code before and after the patch is equal. Implicitly changing sections in assembly files makes it more difficult to follow why the assembler is doing certain things. For example, .p2align 5 KPROBE_ENTRY(...) was not doing what you would expect. Other section changes (__ex_table, .fixup, .init.rodata) are done explicitly already. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-27 12:37:53 +01:00
Ingo Molnar	c7cc773076	Merge branches 'tracing/blktrace', 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/power-tracer' into tracing/core	2008-11-27 10:56:13 +01:00
Marcelo Tosatti	6c475352e8	KVM: MMU: avoid creation of unreachable pages in the shadow It is possible for a shadow page to have a parent link pointing to a freed page. When zapping a high level table, kvm_mmu_page_unlink_children fails to remove the parent_pte link. For that to happen, the child must be unreachable via the shadow tree, which can happen in shadow_walk_entry if the guest pte was modified in between walk() and fetch(). Remove the parent pte reference in such case. Possible cause for oops in bug #2217430. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-11-26 12:34:27 +02:00
Hannes Eder	4db646b1af	x86: microcode: fix sparse warnings Impact: make global variables and a function static Fix following sparse warnings: arch/x86/kernel/microcode_core.c:102:22: warning: symbol 'microcode_ops' was not declared. Should it be static? arch/x86/kernel/microcode_core.c:206:24: warning: symbol 'microcode_pdev' was not declared. Should it be static? arch/x86/kernel/microcode_core.c:322:6: warning: symbol 'microcode_update_cpu' was not declared. Should it be static? arch/x86/kernel/microcode_intel.c:468:22: warning: symbol 'microcode_intel_ops' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 08:49:19 +01:00
Arjan van de Ven	f3f47a6768	tracing: add "power-tracer": C/P state tracer to help power optimization Impact: new "power-tracer" ftrace plugin This patch adds a C/P-state ftrace plugin that will generate detailed statistics about the C/P-states that are being used, so that we can look at detailed decisions that the C/P-state code is making, rather than the too high level "average" that we have today. An example way of using this is: mount -t debugfs none /sys/kernel/debug echo cstate > /sys/kernel/debug/tracing/current_tracer echo 1 > /sys/kernel/debug/tracing/tracing_enabled sleep 1 echo 0 > /sys/kernel/debug/tracing/tracing_enabled cat /sys/kernel/debug/tracing/trace \| perl scripts/trace/cstate.pl > out.svg Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 08:29:32 +01:00
Steven Rostedt	5a45cfe1c6	ftrace: use code patching for ftrace graph tracer Impact: more efficient code for ftrace graph tracer This patch uses the dynamic patching, when available, to patch the function graph code into the kernel. This patch will ease the way for letting both function tracing and function graph tracing run together. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 06:52:54 +01:00
Hiroshi Shimamoto	5ceb40da9b	x86: signal: unify signal_{32\|64}.c Impact: cleanup Unify signal_{32\|64}.c! Mechanic unification - the two files are the same. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 05:11:56 +01:00
Hiroshi Shimamoto	e5fa2d063c	x86: signal: unify signal_{32\|64}.c, prepare Impact: cleanup Add #ifdef directive for 32-bit only code. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 05:11:54 +01:00
Hiroshi Shimamoto	bfeb91a943	x86: signal: cosmetic unification of __setup_sigframe() and __setup_rt_sigframe() Impact: cleanup Add #ifdef directive to unify __setup_sigframe() and __setup_rt_sigframe(). Move them after {setup\|restore}_sigcontext() declaration. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 05:11:53 +01:00
Hiroshi Shimamoto	2601657d22	x86: signal: move {setup\|restore}_sigcontext() Impact: cleanup Move {setup\|restore}_sigcontext() declaration onto head of file. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 05:11:51 +01:00
Andreas Herrmann	ffd565a8b8	x86: fixup config space size of CPU functions for AMD family 11h Impact: extend allowed configuration space access on 11h CPUs from 256 to 4K Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 03:53:27 +01:00
Ingo Molnar	c2324b694f	tracing: function graph tracer, fix fix return-tracer => graph-tracer namespace rename fallout. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 03:10:01 +01:00
Frederic Weisbecker	287b6e68ca	tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 01:59:45 +01:00
Frederic Weisbecker	fb52607afc	tracing/function-return-tracer: change the name into function-graph-tracer Impact: cleanup This patch changes the name of the "return function tracer" into function-graph-tracer which is a more suitable name for a tracing which makes one able to retrieve the ordered call stack during the code flow. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 01:59:45 +01:00
Andreas Herrmann	a266d9f125	[CPUFREQ] powernow-k8: ignore out-of-range PstateStatus value A workaround for AMD CPU family 11h erratum 311 might cause that the P-state Status Register shows a "current P-state" which is larger than the "current P-state limit" in P-state Current Limit Register. For the wrong P-state value there is no ACPI _PSS object defined and powernow-k8/cpufreq can't determine the proper CPU frequency for that state. As a consequence this can cause a panic during boot (potentially with all recent kernel versions -- at least I have reproduced it with various 2.6.27 kernels and with the current .28 series), as an example: powernow-k8: Found 1 AMD Turion(tm)X2 Ultra DualCore Mobile ZM-82 processors (2 \ ) powernow-k8: 0 : pstate 0 (2200 MHz) powernow-k8: 1 : pstate 1 (1100 MHz) powernow-k8: 2 : pstate 2 (600 MHz) BUG: unable to handle kernel paging request at ffff88086e7528b8 IP: [<ffffffff80486361>] cpufreq_stats_update+0x4a/0x5f PGD 202063 PUD 0 Oops: 0002 [#1] SMP last sysfs file: CPU 1 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.28-rc3-dirty #16 RIP: 0010:[<ffffffff80486361>] [<ffffffff80486361>] cpufreq_stats_update+0x4a/0\ f Synaptics claims to have extended capabilities, but I'm not able to read them.<6\ 6 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88006e7528c0 RDX: 00000000ffffffff RSI: ffff88006e54af00 RDI: ffffffff808f056c RBP: 00000000fffee697 R08: 0000000000000003 R09: ffff88006e73f080 R10: 0000000000000001 R11: 00000000002191c0 R12: ffff88006fb83c10 R13: 00000000ffffffff R14: 0000000000000001 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88006fb50740(0000) knlGS:0000000000000000 Unable to initialize Synaptics hardware. CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: ffff88086e7528b8 CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 1, threadinfo ffff88006fb82000, task ffff88006fb816d0) Stack: ffff88006e74da50 0000000000000000 ffff88006e54af00 ffffffff804863c7 ffff88006e74da50 0000000000000000 00000000ffffffff 0000000000000000 ffff88006fb83c10 ffffffff8024b46c ffffffff808f0560 ffff88006fb83c10 Call Trace: [<ffffffff804863c7>] ? cpufreq_stat_notifier_trans+0x51/0x83 [<ffffffff8024b46c>] ? notifier_call_chain+0x29/0x4c [<ffffffff8024b561>] ? __srcu_notifier_call_chain+0x46/0x61 [<ffffffff8048496d>] ? cpufreq_notify_transition+0x93/0xa9 [<ffffffff8021ab8d>] ? powernowk8_target+0x1e8/0x5f3 [<ffffffff80486687>] ? cpufreq_governor_performance+0x1b/0x20 [<ffffffff80484886>] ? __cpufreq_governor+0x71/0xa8 [<ffffffff80484b21>] ? __cpufreq_set_policy+0x101/0x13e [<ffffffff80485bcd>] ? cpufreq_add_dev+0x3f0/0x4cd [<ffffffff8048577a>] ? handle_update+0x0/0x8 [<ffffffff803c2062>] ? sysdev_driver_register+0xb6/0x10d [<ffffffff8056592c>] ? powernowk8_init+0x0/0x7e [<ffffffff8048604c>] ? cpufreq_register_driver+0x8f/0x140 [<ffffffff80209056>] ? _stext+0x56/0x14f [<ffffffff802c2234>] ? proc_register+0x122/0x17d [<ffffffff802c23a0>] ? create_proc_entry+0x73/0x8a [<ffffffff8025c259>] ? register_irq_proc+0x92/0xaa [<ffffffff8025c2c8>] ? init_irq_proc+0x57/0x69 [<ffffffff807fc85f>] ? kernel_init+0x116/0x169 [<ffffffff8020cc79>] ? child_rip+0xa/0x11 [<ffffffff807fc749>] ? kernel_init+0x0/0x169 [<ffffffff8020cc6f>] ? child_rip+0x0/0x11 Code: 05 c5 83 36 00 48 c7 c2 48 5d 86 80 48 8b 04 d8 48 8b 40 08 48 8b 34 02 48\ RIP [<ffffffff80486361>] cpufreq_stats_update+0x4a/0x5f RSP <ffff88006fb83b20> CR2: ffff88086e7528b8 ---[ end trace 0678bac75e67a2f7 ]--- Kernel panic - not syncing: Attempted to kill init! In short, aftereffect of the wrong P-state is that cpufreq_stats_update() uses "-1" as index for some array in cpufreq_stats_update (unsigned int cpu) { ... if (stat->time_in_state) stat->time_in_state[stat->last_index] = cputime64_add(stat->time_in_state[stat->last_index], cputime_sub(cur_time, stat->last_time)); ... } Fortunately, the wrong P-state value is returned only if the core is in P-state 0. This fix solves the problem by detecting the out-of-range P-state, ignoring it, and using "0" instead. Cc: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>	2008-11-25 13:38:29 -05:00
Markus Metzger	1e9b51c283	x86, bts, ftrace: a BTS ftrace plug-in prototype Impact: add new ftrace plugin A prototype for a BTS ftrace plug-in. The tracer collects branch trace in a cyclic buffer for each cpu. The tracer is not configurable and the trace for each snapshot is appended when doing cat /debug/tracing/trace. This is a proof of concept that will be extended with future patches to become a (hopefully) useful tool. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 17:31:13 +01:00
Markus Metzger	6abb11aecd	x86, bts, ptrace: move BTS buffer allocation from ds.c into ptrace.c Impact: restructure DS memory allocation to be done by the usage site of DS Require pre-allocated buffers in ds.h. Move the BTS buffer allocation for ptrace into ptrace.c. The pointer to the allocated buffer is stored in the traced task's task_struct together with the handle returned by ds_request_bts(). Removes memory accounting code. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 17:31:12 +01:00
Markus Metzger	ca0002a179	x86, bts: base in-kernel ds interface on handles Impact: generalize the DS code to shared buffers Change the in-kernel ds.h interface to identify the tracer via a handle returned on ds_request_~(). Tracers used to be identified via their task_struct. The changes are required to allow DS to be shared between different tasks, which is needed for perfmon2 and for ftrace. For ptrace, the handle is stored in the traced task's task_struct. This should probably go into a (arch-specific) ptrace context some time. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 17:31:11 +01:00
Ingo Molnar	7d55718b0c	Merge branches 'tracing/core', 'x86/urgent' and 'x86/ptrace' into tracing/hw-branch-tracing This pulls together all the topic branches that are needed for the DS/BTS/PEBS tracing work.	2008-11-25 17:30:30 +01:00
Markus Metzger	de90add30e	x86, bts: fix wrmsr and spinlock over kmalloc Impact: fix sleeping-with-spinlock-held bugs/crashes - Turn a wrmsr to write the DS_AREA MSR into a wrmsrl. - Use irqsave variants of spinlocks. - Do not allocate memory while holding spinlocks. Reported-by: Stephane Eranian <eranian@googlemail.com> Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 17:29:02 +01:00
Markus Metzger	c4858ffc8f	x86, pebs: fix PEBS record size configuration Impact: fix DS hw enablement on 64-bit x86 Fix the PEBS record size in the DS configuration. Reported-by: Stephane Eranian <eranian@googlemail.com> Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 17:28:53 +01:00
Markus Metzger	e5e8ca633b	x86, bts: turn macro into static inline function Impact: cleanup Replace a macro with a static inline function. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 17:28:51 +01:00
Markus Metzger	292c669cd7	x86, bts: exclude ds.c from build when disabled Impact: cleanup Move the CONFIG guard from the .c file into the makefile. Reported-by: Andi Kleen <andi-suse@firstfloor.org> Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 17:28:50 +01:00
Julia Lawall	eff79aee91	arch/x86/kernel/pci-calgary_64.c: change simple_strtol to simple_strtoul Impact: fix theoretical option string parsing overflow Since bridge is unsigned, it would seem better to use simple_strtoul that simple_strtol. A simplified version of the semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r2@ long e; position p; @@ e = simple_strtol@p(...) @@ position p != r2.p; type T; T e; @@ e = - simple_strtol@p + simple_strtoul (...) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: muli@il.ibm.com Cc: jdmason@kudzu.us Cc: discuss@x86-64.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 15:56:03 +01:00
Steven Rostedt	5cf02b7baf	x86: use limited register constraint for setnz Impact: build fix with certain compilers GCC can decide to use %dil when "r" is used, which is not valid for setnz. This bug was brought out by Stephen Rothwell's merging of the branch tracer into linux-next. [ Thanks to Uros Bizjak for recommending 'q' over 'Q' ] Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 15:38:03 +01:00
Ingo Molnar	e951e4af2e	x86: fix unused variable warning in arch/x86/kernel/hpet.c Impact: fix build warning this warning: arch/x86/kernel/hpet.c:36: warning: ‘hpet_num_timers’ defined but not used Triggers because hpet_num_timers is unused in the !CONFIG_PCI_MSI case. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 09:03:43 +01:00
Ingo Molnar	6f893fb2e8	Merge branches 'tracing/branch-tracer', 'tracing/fastboot', 'tracing/ftrace', 'tracing/function-return-tracer', 'tracing/power-tracer', 'tracing/powerpc', 'tracing/ring-buffer', 'tracing/stack-tracer' and 'tracing/urgent' into tracing/core	2008-11-24 17:46:24 +01:00
Ingo Molnar	ad07e914e6	x86 defconfig: increase CONFIG_LOG_BUF_SHIFT Impact: double the defconfig printk buffer Booting defconfigs produces more output than 128K so the output is truncated - double it to 256K. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-24 11:33:12 +01:00
H. Peter Anvin	b47b928842	x86: drop REBOOT_CF9_COND from reboot fallback chain Impact: Reverts sequence of reboot fallbacks Checkin `14d7ca5c57` changed the default reboot method to "pci", a.k.a. port CF9. Unfortunately this has been shown to cause lockups on at least two systems for which REBOOT_KBD worked, both Thinkpads with Intel chipsets. Checkin `3889d0cea2` reverted the default, but did not revert the fallback chain. This checkin reverts the fallback chain; port CF9 is now only done by explicit "reboot=pci" or a future potential DMI key. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-24 00:50:09 -08:00
Hannes Eder	3b71e9e307	x86: HPET: fix sparse warning Impact: make global variable static Fix this sparse warning: arch/x86/kernel/hpet.c:36:18: warning: symbol 'hpet_num_timers' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 20:23:37 +01:00
jia zhang	5f5db59132	x86, debug: remove the confusing entry in call trace Impact: improve backtrace quality avoid the confusion in call trace because of the lack of padding at the tail of function. When do_exit gets called, the return address behind call instruction is pushed into stack. If something get wrong in do_exit, for x86_64, the entry "kernel_execve +0x00/0xXX" rather than "child_rip +0xYY/0xZZ" is in the call trace. That looks confusing, so add a u2d to make the return address still part of the original call site. (This also catches any instances of us returning from that function somehow.) Signed-off-by: jia zhang <jia.zhang2008@gmail.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 20:03:36 +01:00
Hannes Eder	a1a00b5885	x86: boot - fix sparse warnings Impact: make global variables static Fix these sparse warnings: arch/x86/boot/video.c:233:3: warning: symbol 'saved' was not declared. Should it be static? arch/x86/boot/video-vga.c:37:13: warning: symbol 'video_vga' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 19:58:58 +01:00
Cyrill Gorcunov	3b6c52b5b6	x86: introduce ENTRY(KPROBE_ENTRY)_X86 assembly helpers to catch unbalanced declaration v3 Impact: make ENTRY()/END() macros more capable It's usefull to catch unbalanced or messed or mixed declarations of ENTRY and KPROBES. These macros would help a bit. For example the following code would compile without problems ENTRY_X86(mcount) retq END_X86(mcount) But if you forget and mess the following form ENTRY_X86(mcount) retq END(mcount) ENTRY_X86(ftrace_caller) The assembler will issue the following message: Error: ENTRY_X86/KPROBE_X86 unbalanced,missed,mixed Actually the checking is performed at every _X86 macro so maybe it's good idea to put ENTRY_KPROBE_FINAL_X86 at the end of .S file to be sure you didn't miss anything. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Alexander van Heukelum <heukelum@mailshack.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 19:57:01 +01:00
Marcelo Tosatti	0c0f40bdbe	KVM: MMU: fix sync of ptes addressed at owner pagetable During page sync, if a pagetable contains a self referencing pte (that points to the pagetable), the corresponding spte may be marked as writable even though all mappings are supposed to be write protected. Fix by clearing page unsync before syncing individual sptes. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-11-23 15:24:19 +02:00
Alexander van Heukelum	6efdcfaf16	x86: KPROBE_ENTRY should be paired wth KPROBE_END Impact: move some code out of .kprobes.text KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END uses .popsection to get back to the previous section (.text, normally). Also replace ENDPROC by END, for consistency. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 14:21:55 +01:00
Alexander van Heukelum	322648d1ba	x86: include ENTRY/END in entry handlers in entry_64.S Impact: cleanup of entry_64.S Except for the order and the place of the functions, this patch should not change the generated code. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 14:21:54 +01:00
Avi Kivity	bd2b3ca768	KVM: VMX: Fix interrupt loss during race with NMI If an interrupt cannot be injected for some reason (say, page fault when fetching the IDT descriptor), the interrupt is marked for reinjection. However, if an NMI is queued at this time, the NMI will be injected instead and the NMI will be lost. Fix by deferring the NMI injection until the interrupt has been injected successfully. Analyzed by Jan Kiszka. Signed-off-by: Avi Kivity <avi@redhat.com>	2008-11-23 14:52:29 +02:00
Hannes Eder	050dc6944b	x86: remove duplicate #define from 'cpufeature.h' Impact: cleanup Remove duplicate #define from 'cpufeature.h'. This also fixes the following sparse warning: arch/x86/kernel/cpu/capflags.c:54:3: warning: Initializer entry defined twice arch/x86/kernel/cpu/capflags.c:58:3: also defined here Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 13:38:50 +01:00
Ian Campbell	86bbc2c235	xen: pin correct PGD on suspend Impact: fix Xen guest boot failure commit `eefb47f6a1` ("xen: use spin_lock_nest_lock when pinning a pagetable") changed xen_pgd_walk to walk over mm->pgd rather than taking pgd as an argument. This breaks xen_mm_(un)pin_all() because it makes init_mm.pgd readonly instead of the pgd we are interested in and therefore the pin subsequently fails. (XEN) mm.c:2280:d15 Bad type (saw 00000000e8000001 != exp 0000000060000000) for mfn bc464 (pfn 21ca7) (XEN) mm.c:2665:d15 Error while pinning mfn bc464 [ 14.586913] 1 multicall(s) failed: cpu 0 [ 14.586926] Pid: 14, comm: kstop/0 Not tainted 2.6.28-rc5-x86_32p-xenU-00172-gee2f6cc #200 [ 14.586940] Call Trace: [ 14.586955] [<c030c17a>] ? printk+0x18/0x1e [ 14.586972] [<c0103df3>] xen_mc_flush+0x163/0x1d0 [ 14.586986] [<c0104bc1>] __xen_pgd_pin+0xa1/0x110 [ 14.587000] [<c015a330>] ? stop_cpu+0x0/0xf0 [ 14.587015] [<c0104d7b>] xen_mm_pin_all+0x4b/0x70 [ 14.587029] [<c022bcb9>] xen_suspend+0x39/0xe0 [ 14.587042] [<c015a330>] ? stop_cpu+0x0/0xf0 [ 14.587054] [<c015a3cd>] stop_cpu+0x9d/0xf0 [ 14.587067] [<c01417cd>] run_workqueue+0x8d/0x150 [ 14.587080] [<c030e4b3>] ? _spin_unlock_irqrestore+0x23/0x40 [ 14.587094] [<c014558a>] ? prepare_to_wait+0x3a/0x70 [ 14.587107] [<c0141918>] worker_thread+0x88/0xf0 [ 14.587120] [<c01453c0>] ? autoremove_wake_function+0x0/0x50 [ 14.587133] [<c0141890>] ? worker_thread+0x0/0xf0 [ 14.587146] [<c014509c>] kthread+0x3c/0x70 [ 14.587157] [<c0145060>] ? kthread+0x0/0x70 [ 14.587170] [<c0109d1b>] kernel_thread_helper+0x7/0x10 [ 14.587181] call 1/3: op=14 arg=[c0415000] result=0 [ 14.587192] call 2/3: op=14 arg=[e1ca2000] result=0 [ 14.587204] call 3/3: op=26 arg=[c1808860] result=-22 Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 13:32:24 +01:00
Cyrill Gorcunov	8a2503fa4a	x86: move dwarf2 related macro to dwarf2.h Impact: cleanup Move recently introduced dwarf2 macros to dwarf2.h file. It allow us to not duplicate them in assembly files. Active usage of _cfi macros don't make assembly files more obvious to understand but we already have a lot of macros there which requires to search the definitions of them anyway. But at least it make every cfi usage one line shorter. Also some code alignment is done. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 13:20:52 +01:00
Ingo Molnar	3d994e1076	Merge branch 'oprofile-for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into x86/urgent	2008-11-23 12:16:57 +01:00
Thomas Gleixner	a1967d6441	x86: revert irq number limitation Impact: fix MSIx not enough irq numbers available regression The manual revert of the sparse_irq patches missed to bring the number of possible irqs back to the .27 status. This resulted in a regression when two multichannel network cards were placed in a system with only one IO_APIC - causing the networking driver to not have the right IRQ and the device not coming up. Remove the dynamic allocation logic leftovers and simply return NR_IRQS in probe_nr_irqs() for now. Fixes: http://lkml.org/lkml/2008/11/19/354 Reported-by: Jesper Dangaard Brouer <hawk@diku.dk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jesper Dangaard Brouer <hawk@diku.dk> Acked-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 11:59:52 +01:00
Török Edwin	8d26487fd4	tracing/stack-tracer: introduce CONFIG_USER_STACKTRACE_SUPPORT Impact: cleanup User stack tracing is just implemented for x86, but it is not x86 specific. Introduce a generic config flag, that is currently enabled only for x86. When other arches implement it, they will have to SELECT USER_STACKTRACE_SUPPORT. Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 11:53:50 +01:00
Török Edwin	8d7c6a9616	tracing/stack-tracer: fix style issues Impact: cleanup Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 11:53:48 +01:00
Hannes Eder	4e42ebd57b	x86: hypervisor - fix sparse warnings Impact: fix sparse build warning Fix the following sparse warnings: arch/x86/kernel/cpu/hypervisor.c:37:15: warning: symbol 'get_hypervisor_tsc_freq' was not declared. Should it be static? arch/x86/kernel/cpu/hypervisor.c:53:16: warning: symbol 'init_hypervisor' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Cc: "Alok N Kataria" <akataria@vmware.com> Cc: "Dan Hecht" <dhecht@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 11:11:52 +01:00
Hannes Eder	c450d7805b	x86: vmware - fix sparse warnings Impact: fix sparse build warning Fix the following sparse warnings: arch/x86/kernel/cpu/vmware.c:69:5: warning: symbol 'vmware_platform' was not declared. Should it be static? arch/x86/kernel/cpu/vmware.c:89:15: warning: symbol 'vmware_get_tsc_khz' was not declared. Should it be static? arch/x86/kernel/cpu/vmware.c:107:16: warning: symbol 'vmware_set_feature_bits' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Cc: "Alok N Kataria" <akataria@vmware.com> Cc: "Dan Hecht" <dhecht@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 11:02:36 +01:00
Ingo Molnar	9f14416442	Merge commit 'v2.6.28-rc6' into irq/urgent	2008-11-23 10:52:33 +01:00
Hiroshi Shimamoto	2456d738ef	x86: signal: cosmetic unification of sys_rt_sigreturn() Impact: cleanup Add #ifdef directive for unification. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 10:50:59 +01:00
Hiroshi Shimamoto	666ac7be04	x86: signal: cosmetic unification of sys_sigaltstack() Impact: cleanup Add #ifdef directive for unification. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 10:50:58 +01:00
Hiroshi Shimamoto	5c9b3a0c7b	x86: signal: cosmetic unification of including headers Impact: cleanup Make the headers portion of signal_32.c and signal_64.c the same. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 10:50:57 +01:00
Török Edwin	02b67518e2	tracing: add support for userspace stacktraces in tracing/iter_ctrl Impact: add new (default-off) tracing visualization feature Usage example: mount -t debugfs nodev /sys/kernel/debug cd /sys/kernel/debug/tracing echo userstacktrace >iter_ctrl echo sched_switch >current_tracer echo 1 >tracing_enabled .... run application ... echo 0 >tracing_enabled Then read one of 'trace','latency_trace','trace_pipe'. To get the best output you can compile your userspace programs with frame pointers (at least glibc + the app you are tracing). Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 09:25:15 +01:00
Frederic Weisbecker	f201ae2356	tracing/function-return-tracer: store return stack into task_struct and allocate it dynamically Impact: use deeper function tracing depth safely Some tests showed that function return tracing needed a more deeper depth of function calls. But it could be unsafe to store these return addresses to the stack. So these arrays will now be allocated dynamically into task_struct of current only when the tracer is activated. Typical scheme when tracer is activated: - allocate a return stack for each task in global list. - fork: allocate the return stack for the newly created task - exit: free return stack of current - idle init: same as fork I chose a default depth of 50. I don't have overruns anymore. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 09:17:26 +01:00
Ingo Molnar	a0a70c735e	Merge branches 'tracing/profiling', 'tracing/options' and 'tracing/urgent' into tracing/core	2008-11-23 09:10:32 +01:00
Ingo Molnar	f377fa123d	x86: clean up stack overflow debug check Impact: cleanup Simplify the irq-sampled stack overflow debug check: - eliminate an #idef - use WARN_ONCE() to emit a single warning (all bets are off after the first such warning anyway) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 09:04:39 +01:00
jia zhang	3aeb95d5b7	x86_64: fix the check in stack_overflow_check Impact: make stack overflow debug check and printout narrower stack_overflow_check() should consider the stack usage of pt_regs, and thus it could warn us in advance. Additionally, it looks better for the warning time to start at INITIAL_JIFFIES. Assuming that rsp gets close to the check point before interrupt arrives: when interrupt really happens, thread_info will be partly overrode. Signed-off-by: jia zhang <jia.zhang2008@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-23 08:57:42 +01:00
Ingo Molnar	ca9eed7613	Merge commit 'v2.6.28-rc6' into x86/debug	2008-11-23 08:55:47 +01:00
H. Peter Anvin	3889d0cea2	x86: revert default reboot method to REBOOT_KBD Impact: Reverts default reboot method. Checkin `14d7ca5c57` changed the default reboot method to "pci", a.k.a. port CF9. Unfortunately this has been shown to cause lockups on at least two systems for which REBOOT_KBD worked, both Thinkpads with Intel chipsets. This reverts the default to REBOOT_KBD, while leaving the option to have "reboot=pci" specified explicitly or via a DMI match. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-22 23:39:23 -08:00
Alexander van Heukelum	c81084114f	x86: split out some macro's and move common code to paranoid_exit, fix Impact: fix bootup crash Even though it tested fine for me, there was still a bug in the first patch: I have overlooked a call to ptregscall_common. This patch fixes that, I think, but the code is never executed for me while running a debian install... (I tested this by putting an "1:jmp 1b" in there.) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-22 09:45:50 +01:00
Ingo Molnar	57550b27ff	Merge commit 'v2.6.28-rc6' into x86/urgent	2008-11-21 20:55:09 +01:00
Alexander van Heukelum	b8b1d08bf6	x86: entry_64.S: split out some macro's and move common code to paranoid_exit Impact: cleanup DISABLE_INTERRUPTS(CLBR_NONE)/TRACE_IRQS_OFF is now always executed just before paranoid_exit. Move it there. Split out paranoidzeroentry, paranoiderrorentry, and paranoidzeroentry_ist to get more readable macro's. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-21 19:02:56 +01:00
Alexander van Heukelum	e2f6bc25b9	x86: entry_64.S: factor out save_paranoid and paranoid_exit Impact: cleanup, shrink kernel image size Also expand the paranoid_exit0 macro into nmi_exit inside the nmi stub in the case of enabled irq-tracing. This gives a few hundred bytes code size reduction. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-21 19:02:55 +01:00
Alexander van Heukelum	c002a1e6b6	x86: introduce save_rest and restructure the PTREGSCALL macro in entry_64.S Impact: cleanup The save_rest function completes a partial stack frame for use by the PTREGSCALL macro. This also avoids the indirect call in PTREGSCALLs. This adds the macro movq_cfi_restore to hide the CFI_RESTORE annotation when restoring a register from the stack frame. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-21 19:02:54 +01:00
Ingo Molnar	14ae22ba2b	x86: entry_64.S: rename Impact: cleanup Rename: CFI_PUSHQ => pushq_cfi CFI_POPQ => popq_cfi CFI_MOVQ => movq_cfi To make it blend better into regular assembly code. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-21 15:20:47 +01:00
Ingo Molnar	e8a0e27662	x86: clean up after: move entry_64.S register saving out of the macros, fix Impact: build fix The break builds with older binutils (2.16.1): arch/x86/kernel/entry_64.S: Assembler messages: arch/x86/kernel/entry_64.S:282: Error: too many positional arguments arch/x86/kernel/entry_64.S:283: Error: too many positional arguments arch/x86/kernel/entry_64.S:284: Error: too many positional arguments arch/x86/kernel/entry_64.S:285: Error: too many positional arguments arch/x86/kernel/entry_64.S:286: Error: too many positional arguments arch/x86/kernel/entry_64.S:287: Error: too many positional arguments arch/x86/kernel/entry_64.S:288: Error: too many positional arguments arch/x86/kernel/entry_64.S:289: Error: too many positional arguments arch/x86/kernel/entry_64.S:290: Error: too many positional arguments Took some time to figure out the detail that GAS chokes on: it's negative offsets. Rearrange the calculations to make sure we never go negative. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-21 15:12:28 +01:00
Ingo Molnar	fc02e90c34	Merge commit 'v2.6.28-rc6' into sched/core	2008-11-21 08:57:04 +01:00
Hiroshi Shimamoto	3ddd972d97	x86: signal: rename COPY_SEG_STRICT to COPY_SEG_CPL3 Impact: cleanup Rename macro COPY_SEG_STRICT to COPY_SEG_CPL3, as suggested by hpa. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-21 08:54:28 +01:00
Matthew Wilcox	0ca4b6b001	x86: Fix interrupt leak due to migration When we migrate an interrupt from one CPU to another, we set the move_in_progress flag and clean up the vectors later once they're not being used. If you're unlucky and call destroy_irq() before the vectors become un-used, the move_in_progress flag is never cleared, which causes the interrupt to become unusable. This was discovered by Jesse Brandeburg for whom it manifested as an MSI-X device refusing to use MSI-X mode when the driver was unloaded and reloaded repeatedly. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-20 13:17:40 -08:00
Linus Torvalds	0260da162f	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: uaccess_64: fix return value in __copy_from_user() x86: quirk for reboot stalls on a Dell Optiplex 330	2008-11-20 13:09:32 -08:00
Alexander van Heukelum	dcd072e260	x86: clean up after: move entry_64.S register saving out of the macros This add-on patch to x86: move entry_64.S register saving out of the macros visually cleans up the appearance of the code by introducing some basic helper macro's. It also adds some cfi annotations which were missing. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-20 19:05:21 +01:00
Rakib Mullick	bfe085f62f	x86: fixing __cpuinit/__init tangle, xsave_cntxt_init() Annotate xsave_cntxt_init() as "can be called outside of __init". Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-20 16:43:42 +01:00
Rakib Mullick	9bc646f163	x86: fix __cpuinit/__init tangle in init_thread_xstate() Impact: fix incorrect __init annotation This patch removes the following section mismatch warning. A patch set was send previously (http://lkml.org/lkml/2008/11/10/407). But introduce some other problem, reported by Rufus (http://lkml.org/lkml/2008/11/11/46). Then Ingo Molnar suggest that, it's best to remove __init from xsave_cntxt_init(void). Which is the second patch in this series. Now, this one removes the following warning. WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x2237): Section mismatch in reference from the function cpu_init() to the function .init.text:init_thread_xstate() The function __cpuinit cpu_init() references a function __init init_thread_xstate(). If init_thread_xstate is only used by cpu_init then annotate init_thread_xstate with a matching annotation. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-20 16:43:41 +01:00
Alexander van Heukelum	d99015b1ab	x86: move entry_64.S register saving out of the macros Here is a combined patch that moves "save_args" out-of-line for the interrupt macro and moves "error_entry" mostly out-of-line for the zeroentry and errorentry macros. The save_args function becomes really straightforward and easy to understand, with the possible exception of the stack switch code, which now needs to copy the return address of to the calling function. Normal interrupts arrive with ((~vector)-0x80) on the stack, which gets adjusted in common_interrupt: <common_interrupt>: (5) addq $0xffffffffffffff80,(%rsp) /* -> ~(vector) / (4) sub $0x50,%rsp / space for registers / (5) callq ffffffff80211290 <save_args> (5) callq ffffffff80214290 <do_IRQ> <ret_from_intr>: ... An apic interrupt stub now look like this: <thermal_interrupt>: (5) pushq $0xffffffffffffff05 / ~(vector) / (4) sub $0x50,%rsp / space for registers / (5) callq ffffffff80211290 <save_args> (5) callq ffffffff80212b8f <smp_thermal_interrupt> (5) jmpq ffffffff80211f93 <ret_from_intr> Similarly the exception handler register saving function becomes simpler, without the need of any parameter shuffling. The stub for an exception without errorcode looks like this: <overflow>: (6) callq 0x1cad12(%rip) # ffffffff803dd448 <pv_irq_ops+0x38> (2) pushq $0xffffffffffffffff /* no syscall / (4) sub $0x78,%rsp / space for registers / (5) callq ffffffff8030e3b0 <error_entry> (3) mov %rsp,%rdi / pt_regs pointer / (2) xor %esi,%esi / no error code / (5) callq ffffffff80213446 <do_overflow> (5) jmpq ffffffff8030e460 <error_exit> And one for an exception with errorcode like this: <segment_not_present>: (6) callq 0x1cab92(%rip) # ffffffff803dd448 <pv_irq_ops+0x38> (4) sub $0x78,%rsp /* space for registers / (5) callq ffffffff8030e3b0 <error_entry> (3) mov %rsp,%rdi / pt_regs pointer / (5) mov 0x78(%rsp),%rsi / load error code / (9) movq $0xffffffffffffffff,0x78(%rsp) / no syscall */ (5) callq ffffffff80213209 <do_segment_not_present> (5) jmpq ffffffff8030e460 <error_exit> Unfortunately, this last type is more than 32 bytes. But the total space savings due to this patch is about 2500 bytes on an smp-configuration, and I think the code is clearer than it was before. The tested kernels were non-paravirt ones (i.e., without the indirect call at the top of the exception handlers). Anyhow, I tested this patch on top of a recent -tip. The machine was an 2x4-core Xeon at 2333MHz. Measured where the delays between (almost-)adjacent rdtsc instructions. The graphs show how much time is spent outside of the program as a function of the measured delay. The area under the graph represents the total time spent outside the program. Eight instances of the rdtsctest were started, each pinned to a single cpu. The histogams are added. For each kernel two measurements were done: one in mostly idle condition, the other while running "bonnie++ -f", bound to cpu 0. Each measurement took 40 minutes runtime. See the attached graphs for the results. The graphs overlap almost everywhere, but there are small differences. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-20 10:49:57 +01:00
Ingo Molnar	c032a2de4c	Merge branch 'x86/cleanups' into x86/irq [ merged x86/cleanups into x86/irq to enable a wider IRQ entry code patch to be applied, which depends on a cleanup patch in x86/cleanups. ]	2008-11-20 10:48:31 +01:00
Yinghai Lu	87f7606591	x86: fix wakeup_cpu with numaq/es7000 v2 - call ->update_genapic() Impact: fix boot crash on 32-bit Hiroshi Shimamoto reported a boot failure on 32-bit x86. The setting of x86_quirks.wakeup_cpu is missing (when not passing in an explicit apic= boot parameter). Reported-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-20 10:45:17 +01:00
Richard A. Holden III	bb5574608a	x86: fix arch/x86/kernel/setup.c build warning when !CONFIG_X86_RESERVE_LOW_64K Impact: cleanup Fix: arch/x86/kernel/setup.c:592: warning: 'dmi_low_memory_corruption' defined but not used this is only used if CONFIG_X86_RESERVE_LOW_64K is defined. Signed-off-by: Richard A. Holden III <aciddeath@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-20 09:04:25 +01:00
Ingo Molnar	90accd6fab	Merge branch 'linus' into x86/memory-corruption-check	2008-11-20 09:03:38 +01:00
Richard A. Holden III	77be80e437	x86: fix arch/x86/kernel/genx2apic_uv_x.c build warning when !CONFIG_HOTPLUG_CPU Impact: cleanup, reduce size of the kernel image a bit Fix: arch/x86/kernel/genx2apic_uv_x.c:403: warning: 'uv_heartbeat_disable' defined but not used the function is only used when CONFIG_HOTPLUG_CPU is defined. Signed-off-by: Richard A. Holden III <aciddeath@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-20 09:03:04 +01:00
Ingo Molnar	fbc2a06056	Merge branch 'linus' into x86/uv	2008-11-20 09:02:39 +01:00
Linus Torvalds	3108864e2d	Merge branch 'x86/numa' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86/numa' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: make NUMA on 32-bit depend on EXPERIMENTAL again x86, hibernate: fix breakage on x86_32 with CONFIG_NUMA set	2008-11-19 18:53:02 -08:00
Linus Torvalds	4f7dbc7ff4	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: more general identifier for Phoenix BIOS AMD IOMMU: check for next_bit also in unmapped area AMD IOMMU: fix fullflush comparison length AMD IOMMU: enable device isolation per default AMD IOMMU: add parameter to disable device isolation x86, PEBS/DS: fix code flow in ds_request() x86: add rdtsc barrier to TSC sync check xen: fix scrub_page() x86: fix es7000 compiling x86, bts: fix unlock problem in ds.c x86, voyager: fix smp generic helper voyager breakage x86: move iomap.h to the new include location	2008-11-19 18:51:56 -08:00
Ulrich Drepper	de11defebf	reintroduce accept4 Introduce a new accept4() system call. The addition of this system call matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(), inotify_init1(), epoll_create1(), pipe2()) which added new system calls that differed from analogous traditional system calls in adding a flags argument that can be used to access additional functionality. The accept4() system call is exactly the same as accept(), except that it adds a flags bit-mask argument. Two flags are initially implemented. (Most of the new system calls in 2.6.27 also had both of these flags.) SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled for the new file descriptor returned by accept4(). This is a useful security feature to avoid leaking information in a multithreaded program where one thread is doing an accept() at the same time as another thread is doing a fork() plus exec(). More details here: http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling", Ulrich Drepper). The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag to be enabled on the new open file description created by accept4(). (This flag is merely a convenience, saving the use of additional calls fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result. Here's a test program. Works on x86-32. Should work on x86-64, but I (mtk) don't have a system to hand to test with. It tests accept4() with each of the four possible combinations of SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies that the appropriate flags are set on the file descriptor/open file description returned by accept4(). I tested Ulrich's patch in this thread by applying against 2.6.28-rc2, and it passes according to my test program. /* test_accept4.c Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk <mtk.manpages@gmail.com> Licensed under the GNU GPLv2 or later. / #define _GNU_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <sys/socket.h> #include <netinet/in.h> #include <stdlib.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #define PORT_NUM 33333 #define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0) /********************************************************************/ / The following is what we need until glibc gets a wrapper for accept4() / / Flags for socket(), socketpair(), accept4() / #ifndef SOCK_CLOEXEC #define SOCK_CLOEXEC O_CLOEXEC #endif #ifndef SOCK_NONBLOCK #define SOCK_NONBLOCK O_NONBLOCK #endif #ifdef __x86_64__ #define SYS_accept4 288 #elif __i386__ #define USE_SOCKETCALL 1 #define SYS_ACCEPT4 18 #else #error "Sorry -- don't know the syscall # on this architecture" #endif static int accept4(int fd, struct sockaddr sockaddr, socklen_t addrlen, int flags) { printf("Calling accept4(): flags = %x", flags); if (flags != 0) { printf(" ("); if (flags & SOCK_CLOEXEC) printf("SOCK_CLOEXEC"); if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK)) printf(" "); if (flags & SOCK_NONBLOCK) printf("SOCK_NONBLOCK"); printf(")"); } printf("\n"); #if USE_SOCKETCALL long args[6]; args[0] = fd; args[1] = (long) sockaddr; args[2] = (long) addrlen; args[3] = flags; return syscall(SYS_socketcall, SYS_ACCEPT4, args); #else return syscall(SYS_accept4, fd, sockaddr, addrlen, flags); #endif } /********************************************************************/ static int do_test(int lfd, struct sockaddr_in conn_addr, int closeonexec_flag, int nonblock_flag) { int connfd, acceptfd; int fdf, flf, fdf_pass, flf_pass; struct sockaddr_in claddr; socklen_t addrlen; printf("=======================================\n"); connfd = socket(AF_INET, SOCK_STREAM, 0); if (connfd == -1) die("socket"); if (connect(connfd, (struct sockaddr ) conn_addr, sizeof(struct sockaddr_in)) == -1) die("connect"); addrlen = sizeof(struct sockaddr_in); acceptfd = accept4(lfd, (struct sockaddr ) &claddr, &addrlen, closeonexec_flag \| nonblock_flag); if (acceptfd == -1) { perror("accept4()"); close(connfd); return 0; } fdf = fcntl(acceptfd, F_GETFD); if (fdf == -1) die("fcntl:F_GETFD"); fdf_pass = ((fdf & FD_CLOEXEC) != 0) == ((closeonexec_flag & SOCK_CLOEXEC) != 0); printf("Close-on-exec flag is %sset (%s); ", (fdf & FD_CLOEXEC) ? "" : "not ", fdf_pass ? "OK" : "failed"); flf = fcntl(acceptfd, F_GETFL); if (flf == -1) die("fcntl:F_GETFD"); flf_pass = ((flf & O_NONBLOCK) != 0) == ((nonblock_flag & SOCK_NONBLOCK) !=0); printf("nonblock flag is %sset (%s)\n", (flf & O_NONBLOCK) ? "" : "not ", flf_pass ? "OK" : "failed"); close(acceptfd); close(connfd); printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL"); return fdf_pass && flf_pass; } static int create_listening_socket(int port_num) { struct sockaddr_in svaddr; int lfd; int optval; memset(&svaddr, 0, sizeof(struct sockaddr_in)); svaddr.sin_family = AF_INET; svaddr.sin_addr.s_addr = htonl(INADDR_ANY); svaddr.sin_port = htons(port_num); lfd = socket(AF_INET, SOCK_STREAM, 0); if (lfd == -1) die("socket"); optval = 1; if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval)) == -1) die("setsockopt"); if (bind(lfd, (struct sockaddr ) &svaddr, sizeof(struct sockaddr_in)) == -1) die("bind"); if (listen(lfd, 5) == -1) die("listen"); return lfd; } int main(int argc, char argv[]) { struct sockaddr_in conn_addr; int lfd; int port_num; int passed; passed = 1; port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM; memset(&conn_addr, 0, sizeof(struct sockaddr_in)); conn_addr.sin_family = AF_INET; conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); conn_addr.sin_port = htons(port_num); lfd = create_listening_socket(port_num); if (!do_test(lfd, &conn_addr, 0, 0)) passed = 0; if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0)) passed = 0; if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK)) passed = 0; if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK)) passed = 0; close(lfd); exit(passed ? EXIT_SUCCESS : EXIT_FAILURE); } [mtk.manpages@gmail.com: rewrote changelog, updated test program] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-19 18:49:57 -08:00
Ingo Molnar	9676e73a9e	Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core Conflicts: kernel/trace/ftrace.c [ We conflicted here because we backported a few fixes to tracing/urgent - which has different internal APIs. ]	2008-11-19 10:04:25 +01:00
Ingo Molnar	3ac3ba0b39	Merge branch 'linus' into sched/core Conflicts: kernel/Makefile	2008-11-19 09:44:37 +01:00
Hiroshi Shimamoto	20a4a236c7	x86: uaccess_64: fix return value in __copy_from_user() __copy_from_user() will return invalid value 16 when it fails to access user space and the size is 10. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 22:28:58 +01:00
Steve Conklin	093bac154c	x86: quirk for reboot stalls on a Dell Optiplex 330 Dell Optiplex 330 appears to hang on reboot. This is resolved by adding a quirk to set bios reboot. Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com> Signed-off-by: Steve Conklin <steve.conklin@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 22:22:29 +01:00
Yinghai Lu	b5fe363b7d	x86: use update_genapic to get rid of ES7000_CLUSTERED_APIC v2 Impact: clean up We can autodetect those system that need cluster apic, and update genapic accordingly. We can also remove wakeup.h for e7000, because it's default one is now the same as overall default mach_wakecpu.h Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 17:35:40 +01:00
Ingo Molnar	f632ddcc07	x86: fix wakeup_cpu with numaq/es7000, v2, fix #2 Impact: fix boot crash fix default_update_genapic(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 17:35:19 +01:00
Hiroshi Shimamoto	64977609e3	x86: ia32_signal: change order of storing in setup_sigcontext() Impact: cleanup Change order of storing to match the sigcontext_ia32. And add casting to make this code same as arch/x86/kernel/signal_32.c. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 16:55:37 +01:00
Hiroshi Shimamoto	047ce93581	x86: ia32_signal: remove using temporary variable Impact: cleanup No need to use temporary variable. Also rename the variable same as arch/x86/kernel/signal_32.c. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Reviewed-by: WANG Cong <wangcong@zeuux.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 16:55:36 +01:00
Hiroshi Shimamoto	8c6e5ce0fd	x86: ia32_signal: cleanup macro RELOAD_SEG Impact: cleanup Remove mask parameter because it's always 3. Cleanup coding styles. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Reviewed-by: WANG Cong <wangcong@zeuux.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 16:55:35 +01:00
Hiroshi Shimamoto	d71a68dca5	x86: ia32_signal: introduce COPY_SEG_CPL3 Impact: cleanup Introduce COPY_SEG_CPL3 for ia32_restore_sigcontext(). Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 16:55:34 +01:00
Hiroshi Shimamoto	b78a5b5260	x86: ia32_signal: cleanup macro COPY Impact: cleanup No need to use temporary variable in this case. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 16:55:33 +01:00
Ingo Molnar	73f56c0d35	Merge branch 'iommu-fixes-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent	2008-11-18 16:48:49 +01:00
Philipp Kohlbecher	0af40a4b10	x86: more general identifier for Phoenix BIOS Impact: widen the reach of the low-memory-protect DMI quirk Phoenix BIOSes variously identify their vendor as "Phoenix Technologies, LTD" or "Phoenix Technologies LTD" (without the comma.) This patch makes the identification string in the bad_bios_dmi_table more general (following a suggestion by Ingo Molnar), so that both versions are handled. Again, the patched file compiles cleanly and the patch has been tested successfully on my machine. Signed-off-by: Philipp Kohlbecher <xt28@gmx.de> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 16:11:36 +01:00
Joerg Roedel	8501c45cc3	AMD IOMMU: check for next_bit also in unmapped area Impact: fix possible use of stale IO/TLB entries Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2008-11-18 15:44:43 +01:00
Joerg Roedel	695b5676c7	AMD IOMMU: fix fullflush comparison length Impact: fix comparison length for 'fullflush' Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2008-11-18 15:44:42 +01:00
Joerg Roedel	3ce1f93c6d	AMD IOMMU: enable device isolation per default Impact: makes device isolation the default for AMD IOMMU Some device drivers showed double-free bugs of DMA memory while testing them with AMD IOMMU. If all devices share the same protection domain this can lead to data corruption and data loss. Prevent this by putting each device into its own protection domain per default. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2008-11-18 15:44:31 +01:00
Joerg Roedel	e5e1f606ec	AMD IOMMU: add parameter to disable device isolation Impact: add a new AMD IOMMU kernel command line parameter Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2008-11-18 15:43:23 +01:00
Ingo Molnar	cbe9ee00ce	Merge branch 'x86/urgent' into x86/cleanups	2008-11-18 15:41:36 +01:00
Ingo Molnar	10db4ef7b9	x86, PEBS/DS: fix code flow in ds_request() this compiler warning: arch/x86/kernel/ds.c: In function 'ds_request': arch/x86/kernel/ds.c:368: warning: 'context' may be used uninitialized in this function Shows that the code flow in ds_request() is buggy - it goes into the unlock+release-context path even when the context is not allocated yet. First allocate the context, then do the other checks. Also, take care with GFP allocations under the ds_lock spinlock. Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 15:34:36 +01:00
Joerg Roedel	a1afd01c17	x86: default to SWIOTLB=y on x86_64 Impact: fixes korg bugzilla 11980 A kernel for a 64bit x86 system should always contain the swiotlb code in case it is booted on a machine without any hardware IOMMU supported by the kernel and more than 4GB of RAM. This patch changes Kconfig to always compile swiotlb into the kernel for x86_64. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: stable@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 14:52:46 +01:00
Frederic Weisbecker	0231022cc3	tracing/function-return-tracer: add the overrun field Impact: help to find the better depth of trace We decided to arbitrary define the depth of function return trace as "20". Perhaps this is not enough. To help finding an optimal depth, we measure now the overrun: the number of functions that have been missed for the current thread. By default this is not displayed, we have to do set a particular flag on the return tracer: echo overrun > /debug/tracing/trace_options And the overrun will be printed on the right. As the trace shows below, the current 20 depth is not enough. update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838) update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838) do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838) tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838) tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838) vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274) vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274) set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274) con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274) release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274) release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274) con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274) con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274) n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274) smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274) irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274) smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274) ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274) ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274) ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274) hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 11:11:00 +01:00
James Morris	f3a5c54701	Merge branch 'master' into next Conflicts: fs/cifs/misc.c Merge to resolve above, per the patch below. Signed-off-by: James Morris <jmorris@namei.org> diff --cc fs/cifs/misc.c index ec36410,addd1dc..0000000 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@@ -347,13 -338,13 +338,13 @@@ header_assemble(struct smb_hdr buffer / BB Add support for establishing new tCon and SMB Session / / with userid/password pairs found on the smb session / / for other target tcp/ip addresses BB */ - if (current->fsuid != treeCon->ses->linux_uid) { + if (current_fsuid() != treeCon->ses->linux_uid) { cFYI(1, ("Multiuser mode and UID " "did not match tcon uid")); - read_lock(&GlobalSMBSeslock); - list_for_each(temp_item, &GlobalSMBSessionList) { - ses = list_entry(temp_item, struct cifsSesInfo, cifsSessionList); + read_lock(&cifs_tcp_ses_lock); + list_for_each(temp_item, &treeCon->ses->server->smb_ses_list) { + ses = list_entry(temp_item, struct cifsSesInfo, smb_ses_list); - if (ses->linux_uid == current->fsuid) { + if (ses->linux_uid == current_fsuid()) { if (ses->server == treeCon->ses->server) { cFYI(1, ("found matching uid substitute right smb_uid")); buffer->Uid = ses->Suid;	2008-11-18 18:52:37 +11:00
Yinghai Lu	54ac14a8e9	x86: fix wakeup_cpu with numaq/es7000, v2, fix Impact: fix wakeup_secondary_cpu with hotplug We can not put that into x86_quirks, because that is __initdata. So try to move that to genapic, and add update_genapic in x86_quirks. later we even could use that stub to: 1. autodetect CONFIG_ES7000_CLUSTERED_APIC 2. more correct inquire_remote_apic with apic_verbosity setting. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 00:27:24 +01:00
Venki Pallipadi	93ce99e849	x86: add rdtsc barrier to TSC sync check Impact: fix incorrectly marked unstable TSC clock Patch (commit `0d12cdd` "sched: improve sched_clock() performance") has a regression on one of the test systems here. With the patch, I see: checking TSC synchronization [CPU#0 -> CPU#1]: Measured 28 cycles TSC warp between CPUs, turning off TSC clock. Marking TSC unstable due to check_tsc_sync_source failed Whereas, without the patch syncs pass fine on all CPUs: checking TSC synchronization [CPU#0 -> CPU#1]: passed. Due to this, TSC is marked unstable, when it is not actually unstable. This is because syncs in check_tsc_wrap() goes away due to this commit. As per the discussion on this thread, correct way to fix this is to add explicit syncs as below? Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-18 00:15:02 +01:00
Eric Dumazet	a4a16beade	oprofile: fix an overflow in ppro code reset_value was changed from long to u64 in commit `b991702884` (oprofile: Implement Intel architectural perfmon support) But dynamic allocation of this array use a wrong type (long instead of u64) Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Robert Richter <robert.richter@amd.com>	2008-11-17 18:47:36 +01:00
Yinghai Lu	569712b2b0	x86: fix wakeup_cpu with numaq/es7000, v2 Impact: fix secondary-CPU wakeup/init path with numaq and es7000 While looking at wakeup_secondary_cpu for WAKE_SECONDARY_VIA_NMI: \|#ifdef WAKE_SECONDARY_VIA_NMI \|/* \| * Poke the other CPU in the eye via NMI to wake it up. Remember that the normal \| * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this \| * won't ... remember to clear down the APIC, etc later. \| / \|static int __devinit \|wakeup_secondary_cpu(int logical_apicid, unsigned long start_eip) \|{ \| unsigned long send_status, accept_status = 0; \| int maxlvt; \|... \| if (APIC_INTEGRATED(apic_version[phys_apicid])) { \| maxlvt = lapic_get_maxlvt(); I noticed that there is no warning about undefined phys_apicid... because WAKE_SECONDARY_VIA_NMI and WAKE_SECONDARY_VIA_INIT can not be defined at the same time. So NUMAQ is using wrong wakeup_secondary_cpu. WAKE_SECONDARY_VIA_NMI, WAKE_SECONDARY_VIA_INIT and WAKE_SECONDARY_VIA_MIP are variants of a weird and fragile preprocessor-driven "HAL" mechanisms to specify the kind of secondary-CPU wakeup strategy a given x86 kernel will use. The vast majority of systems want to use INIT for secondary wakeup - NUMAQ uses an NMI, (old-style-) ES7000 uses 'MIP' (a firmware driven in-memory flag to let secondaries continue). So convert these mechanisms to x86_quirks and add a ->wakeup_secondary_cpu() method to specify the rare exception to the sane default. Extend genapic accordingly as well, for 32-bit. While looking further, I noticed that functions in wakecup.h for numaq and es7000 are different to the default in mach_wakecpu.h - but smpboot.c will only use default mach_wakecpu.h with smphook.h. So we need to add mach_wakecpu.h for mach_generic, to properly support numaq and es7000, and vectorize the following SMP init methods: int trampoline_phys_low; int trampoline_phys_high; void (wait_for_init_deassert)(atomic_t deassert); void (smp_callin_clear_local_apic)(void); void (store_NMI_vector)(unsigned short high, unsigned short low); void (restore_NMI_vector)(unsigned short high, unsigned short low); void (*inquire_remote_apic)(int apicid); Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-17 17:57:34 +01:00
Alexander van Heukelum	0bd7b79851	x86: entry_64.S: remove whitespace at end of lines Impact: cleanup All blame goes to: color white,red "[^[:graph:]]+$" in .nanorc ;). Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-17 10:46:55 +01:00
Ingo Molnar	9dacc71ff3	Merge commit 'v2.6.28-rc5' into x86/cleanups	2008-11-17 10:46:18 +01:00
Yinghai Lu	d3c6aa1e69	x86: fix es7000 compiling Impact: fix es7000 build CC arch/x86/kernel/es7000_32.o arch/x86/kernel/es7000_32.c: In function find_unisys_acpi_oem_table: arch/x86/kernel/es7000_32.c:255: error: implicit declaration of function acpi_get_table_with_size arch/x86/kernel/es7000_32.c:261: error: implicit declaration of function early_acpi_os_unmap_memory arch/x86/kernel/es7000_32.c: In function unmap_unisys_acpi_oem_table: arch/x86/kernel/es7000_32.c:277: error: implicit declaration of function __acpi_unmap_table make[1]: *** [arch/x86/kernel/es7000_32.o] Error 1 we applied one patch out of order... \| commit `a73aaedd95` \| Author: Yinghai Lu <yhlu.kernel@gmail.com> \| Date: Sun Sep 14 02:33:14 2008 -0700 \| \| x86: check dsdt before find oem table for es7000, v2 \| \| v2: use __acpi_unmap_table() that patch need: x86: use early_ioremap in __acpi_map_table x86: always explicitly map acpi memory acpi: remove final __acpi_map_table mapping before setting acpi_gbl_permanent_mmap acpi/x86: introduce __apci_map_table, v4 submitted to the ACPI tree but not upstream yet. fix it until those patches applied, need to revert this one Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-16 10:05:07 +01:00
Markus Metzger	d1f1e9c010	x86, bts: fix unlock problem in ds.c Fix a problem where ds_request() returned an error without releasing the ds lock. Reported-by: Stephane Eranian <eranian@gmail.com> Signed-off-by: Markus Metzger <markus.t.metzger@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-16 08:25:36 +01:00
Frederic Weisbecker	e7d3737ea1	tracing/function-return-tracer: support for dynamic ftrace on function return tracer This patch adds the support for dynamic tracing on the function return tracer. The whole difference with normal dynamic function tracing is that we don't need to hook on a particular callback. The only pro that we want is to nop or set dynamically the calls to ftrace_caller (which is ftrace_return_caller here). Some security checks ensure that we are not trying to launch dynamic tracing for return tracing while normal function tracing is already running. An example of trace with getnstimeofday set as a filter: ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-16 07:57:38 +01:00
Frederic Weisbecker	b01c746617	tracing/function-return-tracer: add a barrier to ensure return stack index is incremented in memory Impact: fix possible race condition in ftrace function return tracer This fixes a possible race condition if index incrementation is not immediately flushed in memory. Thanks for Andi Kleen and Steven Rostedt for pointing out this issue and give me this solution. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-16 07:57:37 +01:00
Steven Rostedt	31e889098a	ftrace: pass module struct to arch dynamic ftrace functions Impact: allow archs more flexibility on dynamic ftrace implementations Dynamic ftrace has largly been developed on x86. Since x86 does not have the same limitations as other architectures, the ftrace interaction between the generic code and the architecture specific code was not flexible enough to handle some of the issues that other architectures have. Most notably, module trampolines. Due to the limited branch distance that archs make in calling kernel core code from modules, the module load code must create a trampoline to jump to what will make the larger jump into core kernel code. The problem arises when this happens to a call to mcount. Ftrace checks all code before modifying it and makes sure the current code is what it expects. Right now, there is not enough information to handle modifying module trampolines. This patch changes the API between generic dynamic ftrace code and the arch dependent code. There is now two functions for modifying code: ftrace_make_nop(mod, rec, addr) - convert the code at rec->ip into a nop, where the original text is calling addr. (mod is the module struct if called by module init) ftrace_make_caller(rec, addr) - convert the code rec->ip that should be a nop into a caller to addr. The record "rec" now has a new field called "arch" where the architecture can add any special attributes to each call site record. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-16 07:36:02 +01:00
David Woodhouse	52168e60f7	Revert "x86: blacklist DMAR on Intel G31/G33 chipsets" This reverts commit `e51af66308`, which was wrongly hoovered up and submitted about a month after a better fix had already been merged. The better fix is commit `cbda1ba898` ("PCI/iommu: blacklist DMAR on Intel G31/G33 chipsets"), where we do this blacklisting based on the DMI identification for the offending motherboard, since sometimes this chipset (or at least a chipset with the same PCI ID) apparently _does_ actually have an IOMMU. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-15 11:37:16 -08:00
Alexander van Heukelum	722024dbb7	x86: irq: fix apicinterrupts on 64 bits Impact: Fix interrupt via the apicinterrupt macro Checkin `939b787130` changed the "interrupt" macro, but the "interrupt" macro is also invoked indirectly from the "apicinterrupt" macro. The "apicinterrupt" macro probably should have its own collection of systematic stubs for the same reason the main IRQ code does; as is it is a huge amount of replicated code. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-13 17:28:38 -08:00
James Morris	2b82892565	Merge branch 'master' into next Conflicts: security/keys/internal.h security/keys/process_keys.c security/keys/request_key.c Fixed conflicts above by using the non 'tsk' versions. Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 11:29:12 +11:00
David Howells	a6f76f23d2	CRED: Make execve() take advantage of copy-on-write credentials Make execve() take advantage of copy-on-write credentials, allowing it to set up the credentials in advance, and then commit the whole lot after the point of no return. This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). The credential bits from struct linux_binprm are, for the most part, replaced with a single credentials pointer (bprm->cred). This means that all the creds can be calculated in advance and then applied at the point of no return with no possibility of failure. I would like to replace bprm->cap_effective with: cap_isclear(bprm->cap_effective) but this seems impossible due to special behaviour for processes of pid 1 (they always retain their parent's capability masks where normally they'd be changed - see cap_bprm_set_creds()). The following sequence of events now happens: (a) At the start of do_execve, the current task's cred_exec_mutex is locked to prevent PTRACE_ATTACH from obsoleting the calculation of creds that we make. (a) prepare_exec_creds() is then called to make a copy of the current task's credentials and prepare it. This copy is then assigned to bprm->cred. This renders security_bprm_alloc() and security_bprm_free() unnecessary, and so they've been removed. (b) The determination of unsafe execution is now performed immediately after (a) rather than later on in the code. The result is stored in bprm->unsafe for future reference. (c) prepare_binprm() is called, possibly multiple times. (i) This applies the result of set[ug]id binaries to the new creds attached to bprm->cred. Personality bit clearance is recorded, but now deferred on the basis that the exec procedure may yet fail. (ii) This then calls the new security_bprm_set_creds(). This should calculate the new LSM and capability credentials into bprm->cred. This folds together security_bprm_set() and parts of security_bprm_apply_creds() (these two have been removed). Anything that might fail must be done at this point. (iii) bprm->cred_prepared is set to 1. bprm->cred_prepared is 0 on the first pass of the security calculations, and 1 on all subsequent passes. This allows SELinux in (ii) to base its calculations only on the initial script and not on the interpreter. (d) flush_old_exec() is called to commit the task to execution. This performs the following steps with regard to credentials: (i) Clear pdeath_signal and set dumpable on certain circumstances that may not be covered by commit_creds(). (ii) Clear any bits in current->personality that were deferred from (c.i). (e) install_exec_creds() [compute_creds() as was] is called to install the new credentials. This performs the following steps with regard to credentials: (i) Calls security_bprm_committing_creds() to apply any security requirements, such as flushing unauthorised files in SELinux, that must be done before the credentials are changed. This is made up of bits of security_bprm_apply_creds() and security_bprm_post_apply_creds(), both of which have been removed. This function is not allowed to fail; anything that might fail must have been done in (c.ii). (ii) Calls commit_creds() to apply the new credentials in a single assignment (more or less). Possibly pdeath_signal and dumpable should be part of struct creds. (iii) Unlocks the task's cred_replace_mutex, thus allowing PTRACE_ATTACH to take place. (iv) Clears The bprm->cred pointer as the credentials it was holding are now immutable. (v) Calls security_bprm_committed_creds() to apply any security alterations that must be done after the creds have been changed. SELinux uses this to flush signals and signal handlers. (f) If an error occurs before (d.i), bprm_free() will call abort_creds() to destroy the proposed new credentials and will then unlock cred_replace_mutex. No changes to the credentials will have been made. (2) LSM interface. A number of functions have been changed, added or removed: () security_bprm_alloc(), ->bprm_alloc_security() () security_bprm_free(), ->bprm_free_security() Removed in favour of preparing new credentials and modifying those. () security_bprm_apply_creds(), ->bprm_apply_creds() () security_bprm_post_apply_creds(), ->bprm_post_apply_creds() Removed; split between security_bprm_set_creds(), security_bprm_committing_creds() and security_bprm_committed_creds(). () security_bprm_set(), ->bprm_set_security() Removed; folded into security_bprm_set_creds(). () security_bprm_set_creds(), ->bprm_set_creds() New. The new credentials in bprm->creds should be checked and set up as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the second and subsequent calls. () security_bprm_committing_creds(), ->bprm_committing_creds() (*) security_bprm_committed_creds(), ->bprm_committed_creds() New. Apply the security effects of the new credentials. This includes closing unauthorised files in SELinux. This function may not fail. When the former is called, the creds haven't yet been applied to the process; when the latter is called, they have. The former may access bprm->cred, the latter may not. (3) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) The bprm_security_struct struct has been removed in favour of using the credentials-under-construction approach. (c) flush_unauthorized_files() now takes a cred pointer and passes it on to inode_has_perm(), file_has_perm() and dentry_open(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:24 +11:00
David Howells	350b4da71f	CRED: Wrap task credential accesses in the x86 arch Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:38:40 +11:00
Ingo Molnar	24de38620d	Merge branches 'tracing/branch-tracer', 'tracing/fastboot', 'tracing/function-return-tracer' and 'tracing/urgent' into tracing/core	2008-11-13 09:48:03 +01:00
Rafael J. Wysocki	604d205548	x86: make NUMA on 32-bit depend on EXPERIMENTAL again My previous patch to make CONFIG_NUMA on x86_32 depend on BROKEN turned out to be unnecessary, after all, since the source of the hibernation vs CONFIG_NUMA problem turned out to be the fact that we didn't take the NUMA KVA remapping into account in the hibernation code. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 23:28:52 +01:00
Rafael J. Wysocki	97a70e548b	x86, hibernate: fix breakage on x86_32 with CONFIG_NUMA set Impact: fix crash during hibernation on 32-bit NUMA The NUMA code on x86_32 creates special memory mapping that allows each node's pgdat to be located in this node's memory. For this purpose it allocates a memory area at the end of each node's memory and maps this area so that it is accessible with virtual addresses belonging to low memory. As a result, if there is high memory, these NUMA-allocated areas are physically located in high memory, although they are mapped to low memory addresses. Our hibernation code does not take that into account and for this reason hibernation fails on all x86_32 systems with CONFIG_NUMA=y and with high memory present. Fix this by adding a special mapping for the NUMA-allocated memory areas to the temporary page tables created during the last phase of resume. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 23:28:51 +01:00
Frederic Weisbecker	1dc1c6adf3	tracing/function-return-tracer: call prepare_ftrace_return by registers Impact: Optimize a bit the function return tracer This patch changes the calling convention of prepare_ftrace_return to pass its arguments by register. This will optimize it a bit and prepare it to support dynamic tracing. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 23:15:43 +01:00
Frederic Weisbecker	62d59d17a5	tracing/function-return-tracer: make the function return tracer lockless Impact: remove spinlocks and irq disabling in function return tracer. I've tried to figure out all of the race condition that could happen when the tracer pushes or pops a return address trace to/from the current thread_info. Theory: _ One thread can only execute on one cpu at a time. So this code doesn't need to be SMP-safe. Just drop the spinlock. _ The only race could happen between the current thread and an interrupt. If an interrupt is raised, it will increase the index of the return stack storage and then execute until the end of the tracing to finally free the index it used. We don't need to disable irqs. This is theorical. In practice, I've tested it with a two-core SMP and had no problem at all. Perhaps -tip testing could confirm it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 23:15:43 +01:00
Steven Rostedt	2ed84eeb88	trace: rename unlikely profiler to branch profiler Impact: name change of unlikely tracer and profiler Ingo Molnar suggested changing the config from UNLIKELY_PROFILE to BRANCH_PROFILING. I never did like the "unlikely" name so I went one step farther, and renamed all the unlikely configurations to a "BRANCH" variant. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 22:27:58 +01:00
Prarit Bhargava	8652cb4b0d	x86: warn of incorrect cpu_khz on AMD systems Impact: add debug check If none of the perfctrs are free when calculating cpu_khz we default to using ctr 3 (ie, we just choose 3). This may lead to an incorrect tsc freq value which can cause the system to be unstable. To aid in future debugging, WARN the user of a potential problem. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 19:57:35 +01:00
Linus Torvalds	5d2007ebc2	Merge branch 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: KVM: Fix pit memory leak if unable to allocate irq source id KVM: ia64: fix vmm_spin_{un}lock for !CONFIG_SMP KVM: VMX: Set IGMT bit in EPT entry KVM: Require the PCI subsystem x86: KVM guest: fix section mismatch warning in kvmclock.c KVM: ia64: Use guest signal mask when blocking KVM: MMU: increase per-vcpu rmap cache alloc size	2008-11-12 10:38:42 -08:00
H. Peter Anvin	8665596ec0	x86: fix up the new IRQ code for older versions of gas Older versions of gas don't implement the C-style != operator, they instead want the Pascal-style <> operator. Change != to <> so we don't break compilation with those old versions of gas. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-12 10:27:35 -08:00
Linus Torvalds	08c1184fa2	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (47 commits) ACPI: pci_link: remove acpi_irq_balance_set() interface fujitsu-laptop: Add DMI callback for Lifebook S6420 ACPI: EC: Don't do transaction from GPE handler in poll mode. ACPI: EC: lower interrupt storm treshold ACPICA: Use spinlock for acpi_{en\|dis}able_gpe ACPI: EC: restart failed command ACPI: EC: wait for last write gpe ACPI: EC: make kernel messages more useful when GPE storm is detected ACPI: EC: revert msleep patch thinkpad_acpi: fingers off backlight if video.ko is serving this functionality sony-laptop: fingers off backlight if video.ko is serving this functionality msi-laptop: fingers off backlight if video.ko is serving this functionality fujitsu-laptop: fingers off backlight if video.ko is serving this functionality eeepc-laptop: fingers off backlight if video.ko is serving this functionality compal: fingers off backlight if video.ko is serving this functionality asus-acpi: fingers off backlight if video.ko is serving this functionality Acer-WMI: fingers off backlight if video.ko is serving this functionality ACPI video: if no ACPI backlight support, use vendor drivers ACPI: video: Ignore devices that aren't present in hardware Delete an unwanted return statement at evgpe.c ...	2008-11-12 10:24:46 -08:00
Eduardo Habkost	c415b3dce3	x86: disable IRQs before doing anything on nmi_shootdown_cpus() Impact: make nmi_shootdown_cpus() callable from preemptible context We need to know on which CPU we are running on, and we don't want to be preempted while doing this. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 18:55:49 +01:00
Eduardo Habkost	bb8dd270e6	x86: make nmi_shootdown_cpus() available on !SMP and !X86_LOCAL_APIC Impact: widen nmi_shootdown_cpus() availability The X86_LOCAL_APIC #ifdef was for kdump. For !SMP, the function simply does nothing. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 18:55:48 +01:00
Eduardo Habkost	2ddded2138	x86: move nmi_shootdown_cpus() to reboot.c Impact: make nmi_shootdown_cpus() available to the rest of the x86 platform Now nmi_shootdown_cpus() is ready to be used by non-kdump code also. Move it to reboot.c. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 18:55:47 +01:00
Eduardo Habkost	c370e5e089	x86 kdump: make nmi_shootdown_cpus() non-static Impact: make API available to the rest of x86 platform code Add prototype to asm/reboot.h. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 18:55:46 +01:00
Eduardo Habkost	8e29478631	x86 kdump: make kdump_nmi_callback() a function ptr on crash_nmi_callback() Impact: extend nmi_shootdown_cpus() with a callback The reboot code will use a different function on crash_nmi_callback(). Adding a function pointer parameter to nmi_shootdown_cpus() for that. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 18:55:46 +01:00
Eduardo Habkost	d1e7b91cfa	x86 kdump: create kdump_nmi_shootdown_cpus() Impact: cleanup For the kdump-specific code that was living on nmi_shootdown_cpus(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 18:55:45 +01:00
Eduardo Habkost	b2bbe71b82	x86 kdump: move crashing_cpu assignment to nmi_shootdown_cpus() Impact: cleanup This variable will be moved to non-kdump-specific code. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 18:55:44 +01:00
Eduardo Habkost	a7d41820f6	x86 kdump: extract kdump-specific code from crash_nmi_callback() Impact: cleanup The NMI CPU-halting code will be used on non-kdump cases, also (e.g. emergency_reboot when virtualization is enabled). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 18:55:43 +01:00
Ingo Molnar	eb42c75878	Merge branch 'linus' into x86/crashdump	2008-11-12 15:43:39 +01:00
Ingo Molnar	2b7d0390a6	tracing: branch tracer, fix vdso crash Impact: fix bootup crash the branch tracer missed arch/x86/vdso/vclock_gettime.c from disabling tracing, which caused such bootup crashes: [ 201.840097] init[1]: segfault at 7fffed3fe7c0 ip 00007fffed3fea2e sp 000077 also clean up the ugly ifdefs in arch/x86/kernel/vsyscall_64.c by creating DISABLE_UNLIKELY_PROFILE facility for code to turn off instrumentation on a per file basis. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 13:26:38 +01:00
Hiroshi Shimamoto	9cc3c49ed1	x86: ia32_signal: remove unnecessary padding Impact: reduce structure padding Remove unnecessary paddings, this saves 4 bytes. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 12:28:02 +01:00
Hiroshi Shimamoto	4a61204856	x86: signal_32: introduce retcode and rt_retcode Impact: cleanup Introduce retcode and rt_retcode to replace setting up frame->retcode. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 12:28:01 +01:00
Steven Rostedt	1f0d69a9fc	tracing: profile likely and unlikely annotations Impact: new unlikely/likely profiler Andrew Morton recently suggested having an in-kernel way to profile likely and unlikely macros. This patch achieves that goal. When configured, every() likely and unlikely macro gets a counter attached to it. When the condition is hit, the hit and misses of that condition are recorded. These numbers can later be retrieved by: /debugfs/tracing/profile_likely - All likely markers /debugfs/tracing/profile_unlikely - All unlikely markers. # cat /debug/tracing/profile_unlikely \| head correct incorrect % Function File Line ------- --------- - -------- ---- ---- 2167 0 0 do_arch_prctl process_64.c 832 0 0 0 do_arch_prctl process_64.c 804 2670 0 0 IS_ERR err.h 34 71230 5693 7 __switch_to process_64.c 673 76919 0 0 __switch_to process_64.c 639 43184 33743 43 __switch_to process_64.c 624 12740 64181 83 __switch_to process_64.c 594 12740 64174 83 __switch_to process_64.c 590 # cat /debug/tracing/profile_unlikely \| \ awk '{ if ($3 > 25) print $0; }' \|head -20 44963 35259 43 __switch_to process_64.c 624 12762 67454 84 __switch_to process_64.c 594 12762 67447 84 __switch_to process_64.c 590 1478 595 28 syscall_get_error syscall.h 51 0 2821 100 syscall_trace_leave ptrace.c 1567 0 1 100 native_smp_prepare_cpus smpboot.c 1237 86338 265881 75 calc_delta_fair sched_fair.c 408 210410 108540 34 calc_delta_mine sched.c 1267 0 54550 100 sched_info_queued sched_stats.h 222 51899 66435 56 pick_next_task_fair sched_fair.c 1422 6 10 62 yield_task_fair sched_fair.c 982 7325 2692 26 rt_policy sched.c 144 0 1270 100 pre_schedule_rt sched_rt.c 1261 1268 48073 97 pick_next_task_rt sched_rt.c 884 0 45181 100 sched_info_dequeued sched_stats.h 177 0 15 100 sched_move_task sched.c 8700 0 15 100 sched_move_task sched.c 8690 53167 33217 38 schedule sched.c 4457 0 80208 100 sched_info_switch sched_stats.h 270 30585 49631 61 context_switch sched.c 2619 # cat /debug/tracing/profile_likely \| awk '{ if ($3 > 25) print $0; }' 39900 36577 47 pick_next_task sched.c 4397 20824 15233 42 switch_mm mmu_context_64.h 18 0 7 100 __cancel_work_timer workqueue.c 560 617 66484 99 clocksource_adjust timekeeping.c 456 0 346340 100 audit_syscall_exit auditsc.c 1570 38 347350 99 audit_get_context auditsc.c 732 0 345244 100 audit_syscall_entry auditsc.c 1541 38 1017 96 audit_free auditsc.c 1446 0 1090 100 audit_alloc auditsc.c 862 2618 1090 29 audit_alloc auditsc.c 858 0 6 100 move_masked_irq migration.c 9 1 198 99 probe_sched_wakeup trace_sched_switch.c 58 2 2 50 probe_wakeup trace_sched_wakeup.c 227 0 2 100 probe_wakeup_sched_switch trace_sched_wakeup.c 144 4514 2090 31 __grab_cache_page filemap.c 2149 12882 228786 94 mapping_unevictable pagemap.h 50 4 11 73 __flush_cpu_slab slub.c 1466 627757 330451 34 slab_free slub.c 1731 2959 61245 95 dentry_lru_del_init dcache.c 153 946 1217 56 load_elf_binary binfmt_elf.c 904 102 82 44 disk_put_part genhd.h 206 1 1 50 dst_gc_task dst.c 82 0 19 100 tcp_mss_split_point tcp_output.c 1126 As you can see by the above, there's a bit of work to do in rethinking the use of some unlikelys and likelys. Note: the unlikely case had 71 hits that were more than 25%. Note: After submitting my first version of this patch, Andrew Morton showed me a version written by Daniel Walker, where I picked up the following ideas from: 1) Using __builtin_constant_p to avoid profiling fixed values. 2) Using __FILE__ instead of instruction pointers. 3) Using the preprocessor to stop all profiling of likely annotations from vsyscall_64.c. Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar for their feed back on this patch. () Not ever unlikely is recorded, those that are used by vsyscalls (a few of them) had to have profiling disabled. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 11:52:02 +01:00
Ingo Molnar	60a011c736	Merge branch 'tracing/function-return-tracer' into tracing/fastboot	2008-11-12 10:17:09 +01:00
Ingo Molnar	d06bbd6695	Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core Conflicts: kernel/trace/ring_buffer.c	2008-11-12 10:11:37 +01:00
Len Brown	3e0fe36483	Merge branch 'misc' into release	2008-11-11 21:14:11 -05:00
Bjorn Helgaas	32836259ff	ACPI: pci_link: remove acpi_irq_balance_set() interface This removes the acpi_irq_balance_set() interface from the PCI interrupt link driver. x86 used acpi_irq_balance_set() to tell the PCI interrupt link driver to configure links to minimize IRQ sharing. But the link driver can easily figure out whether to turn on IRQ balancing based on the IRQ model (PIC/IOAPIC/etc), so we can get rid of that external interface. It's better for the driver to figure this out at init-time. If we set it externally via the x86 code, the interface reduces modularity, and we depend on the fact that acpi_process_madt() happens before we process the kernel command line. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>	2008-11-11 21:12:05 -05:00
H. Peter Anvin	14d7ca5c57	x86: attempt reboot via port CF9 if we have standard PCI ports Impact: Changes reboot behavior. If port CF9 seems to be safe to touch, attempt it before trying the keyboard controller. Port CF9 is not available on all chipsets (a significant but decreasing number of modern chipsets don't implement it), but port CF9 itself should in general be safe to poke (no ill effects if unimplemented) on any system which has PCI Configuration Method #1 or #2, as it falls inside the PCI configuration port range in both cases. No chipset without PCI is known to have port CF9, either, although an explicit "pci=bios" would mean we miss this and therefore don't use port CF9. An explicit "reboot=pci" can be used to force the use of port CF9. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-11 16:19:48 -08:00
H. Peter Anvin	939b787130	x86: 64 bits: shrink and align IRQ stubs Move the IRQ stub generation to assembly to simplify it and for consistency with 32 bits. Doing it in a C file with asm() statements doesn't help clarity, and it prevents some optimizations. Shrink the IRQ stubs down to just over four bytes per (we fit seven into a 32-byte chunk.) This shrinks the total icache consumption of the IRQ stubs down to an even kilobyte, if all of them are in active use. The downside is that we end up with a double jump, which could have a negative effect on some pipelines. The double jump is always inside the same cacheline on any modern chips. To get the most effect, cache-align the IRQ stubs. This makes the 64-bit code match changes already done to the 32-bit code, and should open up irqinit*.c for unification. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-11 13:51:52 -08:00
H. Peter Anvin	b7c6244f13	x86: 32 bits: shrink and align IRQ stubs Shrink the IRQ stubs on 32 bits down to just over four bytes per (we fit seven into a 32-byte chunk.) This shrinks the total icache consumption of the IRQ stubs down to an even kilobyte, if all of them are in active use. The downside is that we end up with a double jump, which could have a negative effect on some pipelines. The double jump is always inside the same cacheline on any modern chips (the exception being 486/Elan/Geode which have only 16-byte cachelines, but are unlikely to have too many interrupt sources.) To get the most effect, cache-align the IRQ stubs. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-11 13:24:58 -08:00
H. Peter Anvin	4687518c4c	x86: 32 bit: interrupt stub consistency with 64 bit Don't generate interrupt stubs for interrupt vectors below FIRST_EXTERNAL_VECTOR, and make the table of interrupt vectors (interrupt[]) __initconst. Both of these changes both conserve memory and improve consistency with 64 bits. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-11 13:03:07 -08:00
Avi Kivity	e17d1dc086	KVM: Fix pit memory leak if unable to allocate irq source id Reported-By: Daniel Marjamäki <danielm77@spray.se> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-11-11 21:01:51 +02:00
Sheng Yang	928d4bf747	KVM: VMX: Set IGMT bit in EPT entry There is a potential issue that, when guest using pagetable without vmexit when EPT enabled, guest would use PAT/PCD/PWT bits to index PAT msr for it's memory, which would be inconsistent with host side and would cause host MCE due to inconsistent cache attribute. The patch set IGMT bit in EPT entry to ignore guest PAT and use WB as default memory type to protect host (notice that all memory mapped by KVM should be WB). Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-11-11 21:00:37 +02:00
Avi Kivity	ca93e992fd	KVM: Require the PCI subsystem PCI device assignment makes calls to pci code, so require it to be built into the kernel. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-11-11 20:56:13 +02:00
Rakib Mullick	a29a2af378	x86: KVM guest: fix section mismatch warning in kvmclock.c WARNING: arch/x86/kernel/built-in.o(.text+0x1722c): Section mismatch in reference from the function kvm_setup_secondary_clock() to the function .devinit.text:setup_secondary_APIC_clock() The function kvm_setup_secondary_clock() references the function __devinit setup_secondary_APIC_clock(). This is often because kvm_setup_secondary_clock lacks a __devinit annotation or the annotation of setup_secondary_APIC_clock is wrong. Signed-off-by: Md.Rakib H. Mullick <rakib.mullick@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-11-11 20:55:10 +02:00
Linus Torvalds	f21f237cf5	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context nohz: disable tick_nohz_kick_tick() for now irq: call __irq_enter() before calling the tick_idle_check x86: HPET: enter hpet_interrupt_handler with interrupts disabled x86: HPET: read from HPET_Tn_CMP() not HPET_T0_CMP x86: HPET: convert WARN_ON to WARN_ON_ONCE	2008-11-11 10:53:50 -08:00
Marcelo Tosatti	c41ef344de	KVM: MMU: increase per-vcpu rmap cache alloc size The page fault path can use two rmap_desc structures, if: - walk_addr's dirty pte update allocates one rmap_desc. - mmu_lock is dropped, sptes are zapped resulting in rmap_desc being freed. - fetch->mmu_set_spte allocates another rmap_desc. Increase to 4 for safety. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-11-11 20:53:34 +02:00
Ingo Molnar	c280ea5e4c	x86: fix documentation typo in arch/x86/Kconfig Impact: documentation update Chris Snook pointed out that it's Core i7, not Core 7i. Reported-by: Chris Snook <csnook@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-11 19:30:29 +01:00
Thomas Gleixner	a98f8fd24f	x86: apic reset counter on shutdown Impact: avoid spurious lapic timer events on shutdown The apic timer might be close to firing when it is shutdown. We can not really disable the timer - we just mask the interrupt. That way we can get an extra interrupt when it is reenabled. Set the counter to max on shutdown to avoid this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-11 14:56:55 +01:00
Ivan Vecera	d3ec5cae09	x86: call machine_shutdown and stop all CPUs in native_machine_halt Impact: really halt all CPUs on halt Function machine_halt (resp. native_machine_halt) is empty for x86 architectures. When command 'halt -f' is invoked, the message "System halted." is displayed but this is not really true because all CPUs are still running. There are also similar inconsistencies for other arches (some uses power-off for halt or forever-loop with IRQs enabled/disabled). IMO there should be used the same approach for all architectures OR what does the message "System halted" really mean? This patch fixes it for x86. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-11 14:50:02 +01:00
James Bottomley	6cd10f8db3	x86, voyager: fix smp generic helper voyager breakage Impact: build/boot fix for x86/Voyager This change: \| commit `3d44223327` \| Author: Jens Axboe <jens.axboe@oracle.com> \| Date: Thu Jun 26 11:21:34 2008 +0200 \| \| Add generic helpers for arch IPI function calls didn't wire up the voyager smp call function correctly, so do that here. Also make CONFIG_USE_GENERIC_SMP_HELPERS a def_bool y again, since we now use the generic helpers for every x86 architecture. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jens Axboe <Jens.Axboe@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-11 12:08:53 +01:00
Ingo Molnar	19b3e9671c	tracing: function return tracer, build fix fix: arch/x86/kernel/ftrace.c: In function 'ftrace_return_to_handler': arch/x86/kernel/ftrace.c:112: error: implicit declaration of function 'cpu_clock' cpu_clock() is implicitly included via a number of ways, but its real location is sched.h. (Build failure is triggerable if enough other kernel components are turned off.) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-11 12:03:27 +01:00
Cliff Wickman	a3d732f937	x86, UV: fix redundant creation of sgi_uv Impact: fix double entry creation in /proc There is a collision between two UV functions: both uv_ptc_init() and gru_proc_init() try to make /proc/sgi_uv So move it's creation to a single place: uv_system_init() Signed-off-by: Cliff Wickman <cpw@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-11 11:38:50 +01:00
Ingo Molnar	867f7fb3eb	tracing, x86: function return tracer, fix assembly constraints fix: arch/x86/kernel/ftrace.c: Assembler messages: arch/x86/kernel/ftrace.c:140: Error: missing ')' arch/x86/kernel/ftrace.c:140: Error: junk `(%ebp))' after expression arch/x86/kernel/ftrace.c:141: Error: missing ')' arch/x86/kernel/ftrace.c:141: Error: junk `(%ebp))' after expression the [parent_replaced] is used in an =rm fashion, so that constraint is correct in isolation - but [parent_old] aliases register %0 and uses it in an addressing mode that is only valid with registers - so change the constraint from =rm to =r. This fixes the build failure. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-11 11:12:18 +01:00
Ingo Molnar	f1c4be5eda	tracing, x86: clean up FUNCTION_RET_TRACER Kconfig Impact: cleanup move FUNCTION_RET_TRACER to the X86 select section, where we have all the other options. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-11 10:29:17 +01:00
Frederic Weisbecker	caf4b323b0	tracing, x86: add low level support for ftrace return tracing Impact: add infrastructure for function-return tracing Add low level support for ftrace return tracing. This plug-in stores return addresses on the thread_info structure of the current task. The index of the current return address is initialized when the task is the first one (init) and when a process forks (the child). It is not needed when a task does a sys_execve because after this syscall, it still needs to return on the kernel functions it called. Note that the code of return_to_handler has been suggested by Steven Rostedt as almost all of the ideas of improvements in this V3. For purpose of security, arch/x86/kernel/process_32.c is not traced because __switch_to() changes the current task during its execution. That could cause inconsistency in the stored return address of this function even if I didn't have any crash after testing with tracing on this function enabled. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-11 10:29:11 +01:00
Ingo Molnar	e0cb4ebcd9	Merge branch 'tracing/urgent' into tracing/ftrace Conflicts: kernel/trace/trace.c	2008-11-11 09:40:18 +01:00
Ingo Molnar	ae1e9130bf	sched: rename SCHED_NO_NO_OMIT_FRAME_POINTER => SCHED_OMIT_FRAME_POINTER Impact: cleanup, change .config option name We had this ugly config name for a long time for hysteric raisons. Rename it to a saner name. We still cannot get rid of it completely, until /proc/<pid>/stack usage replaces WCHAN usage for good. We'll be able to do that in the v2.6.29/v2.6.30 timeframe. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-11 08:59:20 +01:00
Rafael J. Wysocki	4694516d19	x86: Make NUMA on 32-bit depend on BROKEN While investigating the failure of hibernation on 32-bit x86 with CONFIG_NUMA set, as described in this message http://marc.info/?l=linux-kernel&m=122634118116226&w=4 I asked some people for help and I was told that it wasn't really worth the effort, because CONFIG_NUMA was generally broken on 32-bit x86 systems and it shouldn't be used in such configs. For this reason, make CONFIG_NUMA depend on BROKEN instead of EXPERIMENTAL on x86-32. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Andi Kleen <andi@firstfloor.org> Cc: Pavel Machek <pavel@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-10 13:20:57 -08:00
Matt Fleming	5ceb1a0418	x86: HPET: enter hpet_interrupt_handler with interrupts disabled Some functions that may be called from this handler require that interrupts are disabled. Also, combining IRQF_DISABLED and IRQF_SHARED does not reliably disable interrupts in a handler, so remove IRQF_SHARED from the irq flags (this irq is not shared anyway). Signed-off-by: Matt Fleming <mjf@gentoo.org> Cc: mingo@elte.hu Cc: venkatesh.pallipadi@intel.com Cc: "Will Newton" <will.newton@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-11-10 17:38:07 +01:00
Matt Fleming	89d77a1eb6	x86: HPET: read from HPET_Tn_CMP() not HPET_T0_CMP In hpet_next_event() we check that the value we just wrote to HPET_Tn_CMP(timer) has reached the chip. Currently, we're checking that the value we wrote to HPET_Tn_CMP(timer) is in HPET_T0_CMP, which, if timer is anything other than timer 0, is likely to fail. Signed-off-by: Matt Fleming <mjf@gentoo.org> Cc: mingo@elte.hu Cc: venkatesh.pallipadi@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-11-10 17:38:07 +01:00
Matt Fleming	1de5b08546	x86: HPET: convert WARN_ON to WARN_ON_ONCE It is possible to flood the console with call traces if the WARN_ON condition is true because of the frequency with which this function is called. Signed-off-by: Matt Fleming <mjf@gentoo.org> Cc: mingo@elte.hu Cc: venkatesh.pallipadi@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-11-10 17:38:07 +01:00
Cyrill Gorcunov	ba21ebb6ab	x86: apic - use pr_ macros for logging Impact: cleanup It saves us some source lines and shift the code a bit righter. And a multiline comment style is fixed too :-) Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: "Maciej W. Rozycki" <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-10 09:16:41 +01:00
Cyrill Gorcunov	4e0304310f	x86: apic - calibrate_APIC_clock remove redundant irq-enable-disable Impact: cleanup lapic_timer_setup is self-protected with local_irq_save/restore no need to use them in caller and levt is the per-cpu variable so no concurrent access from another cpu. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: "Maciej W. Rozycki" <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-10 09:16:40 +01:00
Ingo Molnar	4ecd33d930	Merge commit 'v2.6.28-rc4' into x86/apic	2008-11-10 09:16:27 +01:00
Markus Metzger	f4166c54bf	x86, bts: DS and BTS initialization Impact: widen BTS/PEBS ptrace enablement to more CPU models Move BTS initialisation out of an #ifdef CONFIG_X86_64 guard. Assume core2 BTS and DS layout for future models of family 6 processors. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-10 08:50:32 +01:00
Harvey Harrison	19f47c634e	x86: x86_32 has its own irq_regs definition Impact: cleanup Arches that have their own irq_regs definition are expected to define ARCH_HAS_OWN_IRQ_REGS or else a generic (unused) set will also be defined in lib/irq_regs.c Sparse noticed the unused generic one had no prototype: lib/irq_regs.c:15:1: warning: symbol 'per_cpu____irq_regs' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-10 08:41:47 +01:00
Thomas Gleixner	6c2e94033d	x86: apic honour irq affinity which was set in early boot setup_ioapic_dest() is called after the non boot cpus have been brought up. It sets the irq affinity of all already configured interrupts to all cpus and ignores affinity settings which were done by the early bootup code. If the IRQ_NO_BALANCING or IRQ_AFFINITY_SET flags are set then use the affinity mask from the irq descriptor and not TARGET_CPUS. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-09 22:25:08 +01:00
Ingo Molnar	4fcc50abdf	x86: clean up vget_cycles() Impact: remove unused variable I forgot to remove the now unused "cycles_t cycles" parameter from vget_cycles() - which triggers build warnings as tsc.h is included in a number of files. Remove it. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-09 21:05:43 +01:00
Arjan van de Ven	3044646148	x86: move iomap.h to the new include location a new file was accidentally added to include/asm-x86; move it to the new arch/x86/include/asm location Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-11-09 10:07:58 -08:00
Ingo Molnar	cb9e35dce9	x86: clean up rdtsc_barrier() use Impact: cleanup Move rdtsc_barrier() use to vsyscall_64.c where it's relied on, and point out its role in the context of its use. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-08 20:27:00 +01:00
Ingo Molnar	895e031707	Merge branch 'linus' into x86/cleanups	2008-11-08 20:23:02 +01:00
Linus Torvalds	a622cf69b8	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: optimize sched_clock() a bit sched: improve sched_clock() performance	2008-11-08 10:24:28 -08:00
Ingo Molnar	7cbaef9c83	sched: optimize sched_clock() a bit sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling variant of __cycles_2_ns(). Most of the time sched_clock() is called with irqs disabled already. The few places that call it with irqs enabled need to be updated. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-08 17:05:38 +01:00
Ingo Molnar	0d12cdd5f8	sched: improve sched_clock() performance in scheduler-intense workloads native_read_tsc() overhead accounts for 20% of the system overhead: 659567 system_call 41222.9375 686796 schedule 435.7843 718382 __switch_to 665.1685 823875 switch_mm 4526.7857 1883122 native_read_tsc 55385.9412 9761990 total 2.8468 this is large part due to the rdtsc_barrier() that is done before and after reading the TSC. But sched_clock() is not a precise clock in the GTOD sense, using such barriers is completely pointless. So remove the barriers and only use them in vget_cycles(). This improves lat_ctx performance by about 5%. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-08 16:48:19 +01:00
Hiroshi Shimamoto	15002fa9bf	x86: signal: cosmetic unification of setup_sigcontext() Impact: cleanup Make setup_sigcontext() same. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-08 10:16:10 +01:00
Ingo Molnar	a6b0786f7f	Merge branches 'tracing/ftrace', 'tracing/fastboot', 'tracing/nmisafe' and 'tracing/urgent' into tracing/core	2008-11-08 09:34:35 +01:00
Ingo Molnar	01aab518b0	Merge branch 'oprofile-for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into x86/urgent	2008-11-07 19:22:10 +01:00
Linus Torvalds	cb110171a6	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, xen: fix use of pgd_page now that it really does return a page	2008-11-07 09:17:59 -08:00
Andi Kleen	7c64ade53a	oprofile: Fix p6 counter overflow check Fix the counter overflow check for CPUs with counter width > 32 I had a similar change in a different patch that I didn't submit and I didn't notice the problem earlier because it was always tested together. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Robert Richter <robert.richter@amd.com>	2008-11-07 17:34:41 +01:00
Ingo Molnar	258594a138	Merge branch 'sched/urgent' into sched/core	2008-11-07 10:29:58 +01:00
Jeremy Fitzhardinge	d05fdf3160	xen: make sure stray alias mappings are gone before pinning Xen requires that all mappings of pagetable pages are read-only, so that they can't be updated illegally. As a result, if a page is being turned into a pagetable page, we need to make sure all its mappings are RO. If the page had been used for ioremap or vmalloc, it may still have left over mappings as a result of not having been lazily unmapped. This change makes sure we explicitly mop them all up before pinning the page. Unlike aliases created by kmap, the there can be vmalloc aliases even for non-high pages, so we must do the flush unconditionally. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Linux Memory Management List <linux-mm@kvack.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-07 10:05:59 +01:00
Linus Torvalds	a15a82f42c	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: Revert "x86: default to reboot via ACPI" x86: align DirectMap in /proc/meminfo AMD IOMMU: fix lazy IO/TLB flushing in unmap path x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR x86: remove VISWS and PARAVIRT around NR_IRQS puzzle x86: mention ACPI in top-level Kconfig menu x86: size NR_IRQS on 32-bit systems the same way as 64-bit x86: don't allow nr_irqs > NR_IRQS x86/docs: remove noirqbalance param docs x86: don't use tsc_khz to calculate lpj if notsc is passed x86, voyager: fix smp_intr_init() compile breakage AMD IOMMU: fix detection of NP capable IOMMUs	2008-11-06 15:57:24 -08:00
Jeremy Fitzhardinge	47cb2ed9df	x86, xen: fix use of pgd_page now that it really does return a page Impact: fix 32-bit Xen guest boot crash On 32-bit PAE, pud_page, for no good reason, didn't really return a struct page . Since Jan Beulich's fix "i386/PAE: fix pud_page()", pud_page does return a struct page . Because PAE has 3 pagetable levels, the pud level is folded into the pgd level, so pgd_page() is the same as pud_page(), and now returns a struct page *. Update the xen/mmu.c code which uses pgd_page() accordingly. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 23:20:47 +01:00
Ken Chen	a87d091434	x86, sched: enable wchan config menu item on 64-bit Enable the wchan config menu item for now on x86-64 arch? This will at least allow people to enable/disable frame pointers on scheduler functions. Signed-off-by: Ken Chen <kenchen@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 22:24:51 +01:00
Eduardo Habkost	8d00450d29	Revert "x86: default to reboot via ACPI" This reverts commit `c7ffa6c262`. the assumptio of this change was that this would not break any existing machine. Andrey Borzenkov reported troubles with the ACPI reboot method: the system would hang on reboot, necessiating a power cycle. Probably more systems are affected as well. Also, there are patches queued up for v2.6.29 to disable virtualization on emergency_restart() - which was the original motivation of this change. Reported-by: Andrey Borzenkov <arvidjaar@mail.ru> Bisected-by: Andrey Borzenkov <arvidjaar@mail.ru> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 16:05:06 +01:00
Hugh Dickins	b9c3bfc24e	x86: align DirectMap in /proc/meminfo Impact: right-align /proc/meminfo consistent with other fields When the split-LRU patches added Inactive(anon) and Inactive(file) lines to /proc/meminfo, all counts were moved two columns rightwards to fit in. Now move x86's DirectMap lines two columns rightwards to line up. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 15:27:37 +01:00
Ingo Molnar	31f297143b	Merge branch 'iommu-fixes-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent	2008-11-06 15:23:35 +01:00
Joerg Roedel	80be308dfa	AMD IOMMU: fix lazy IO/TLB flushing in unmap path Lazy flushing needs to take care of the unmap path too which is not yet implemented and leads to stale IO/TLB entries. This is fixed by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2008-11-06 14:59:05 +01:00
KOSAKI Motohiro	fd51b2d7d5	x86: update CONFIG_NUMA description Impact: clarify/update CONFIG_NUMA text CONFIG_NUMA description talk about a bit old thing. So, following changes are better. o CONFIG_NUMA is no longer EXPERIMENTAL o Opteron is not the only processor of NUMA topology on x86_64 no longer, but also Intel Core7i has it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 09:50:38 +01:00
Suresh Siddha	d6f0f39b7d	x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR Impact: fix rare x2apic hang On x86, x2apic mode accesses for sending IPI's don't have serializing semantics. If the IPI receivner refers(in lock-free fashion) to some memory setup by the sender, the need for smp_mb() before sending the IPI becomes critical in x2apic mode. Add the smp_mb() in native_flush_tlb_others() before sending the IPI. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 09:41:49 +01:00
Yinghai Lu	7db282fa67	x86: remove VISWS and PARAVIRT around NR_IRQS puzzle Impact: fix warning message when PARAVIRT is set in config Remove stale #ifdef components from our IRQ sizing logic. x86/Voyager is the only holdout. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 09:35:34 +01:00
Bjorn Helgaas	da85f865b1	x86: mention ACPI in top-level Kconfig menu Impact: clarify menuconfig text Mention ACPI in the top-level menu to give a clue as to where it lives. This matches what ia64 does. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 08:16:19 +01:00
Hiroshi Shimamoto	8735b7d0a2	x86: signal_64: make setup_sigcontext() similar Impact: cleanup remove passing task struct. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 08:02:01 +01:00
Hiroshi Shimamoto	ee7d523c12	x86: signal_64: setup fpstate in setup_sigcontext() Impact: cleanup set fpstate field of signal context at setup_sigcontext(). Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 08:02:00 +01:00
Hiroshi Shimamoto	99ea1b93bf	x86: ia32_signal: do save_i387_xstate_ia32 at get_sigframe() Impact: cleanup move calling save_i387_xstate_ia32() into get_sigframe() from setup_sigcontext(). Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 08:02:00 +01:00
Hiroshi Shimamoto	4b33669e81	x86: signal_32: do save_i387_xstate() at get_sigframe() Impact: cleanup move calling save_i387_xstate() into get_sigframe() from setup_sigcontext() like 64bit. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 08:01:59 +01:00
Steven Rostedt	60a7ecf426	ftrace: add quick function trace stop Impact: quick start and stop of function tracer This patch adds a way to disable the function tracer quickly without the need to run kstop_machine. It adds a new variable called function_trace_stop which will stop the calls to functions from mcount when set. This is just an on/off switch and does not handle recursion like preempt_disable(). It's main purpose is to help other tracers/debuggers start and stop tracing fuctions without the need to call kstop_machine. The config option HAVE_FUNCTION_TRACE_MCOUNT_TEST is added for archs that implement the testing of the function_trace_stop in the mcount arch dependent code. Otherwise, the test is done in the C code. x86 is the only arch at the moment that supports this. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 07:50:51 +01:00
Yinghai Lu	1b48976880	x86: size NR_IRQS on 32-bit systems the same way as 64-bit Impact: make NR_IRQS big enough for system with lots of apic/pins If lots of IO_APIC's are there (or can be there), size the same way as 64-bit, depending on MAX_IO_APICS and NR_CPUS. This fixes the boot problem reported by Ben Hutchings on a 32-bit server with 5 IO-APICs and 240 IO-APIC pins. Signed-off-by: Yinghai <yinghai@kernel.org> Tested-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 07:23:22 +01:00
Ben Hutchings	c78d0cf292	x86: don't allow nr_irqs > NR_IRQS Impact: fix boot hang on 32-bit systems with more than 224 IO-APIC pins On some 32-bit systems with a lot of IO-APICs probe_nr_irqs() can return a value larger than NR_IRQS. This will lead to probe_irq_on() overrunning the irq_desc array. I hit this when running net-next-2.6 (close to 2.6.28-rc3) on a Supermicro dual Xeon system. NR_IRQS is 224 but probe_nr_irqs() detects 5 IOAPICs and returns 240. Here are the log messages: Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) Tue Nov 4 16:53:47 2008 IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x02] address[0xfec81000] gsi_base[24]) Tue Nov 4 16:53:47 2008 IOAPIC[1]: apic_id 2, version 32, address 0xfec81000, GSI 24-47 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x03] address[0xfec81400] gsi_base[48]) Tue Nov 4 16:53:47 2008 IOAPIC[2]: apic_id 3, version 32, address 0xfec81400, GSI 48-71 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x04] address[0xfec82000] gsi_base[72]) Tue Nov 4 16:53:47 2008 IOAPIC[3]: apic_id 4, version 32, address 0xfec82000, GSI 72-95 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x05] address[0xfec82400] gsi_base[96]) Tue Nov 4 16:53:47 2008 IOAPIC[4]: apic_id 5, version 32, address 0xfec82400, GSI 96-119 Tue Nov 4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) Tue Nov 4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Tue Nov 4 16:53:47 2008 Enabling APIC mode: Flat. Using 5 I/O APICs Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-06 07:23:21 +01:00
Russ Anderson	23c357003b	x86: uv: Add UV reserved page bios call Add UV bios call to get the address of the reserved page. Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-05 20:30:25 -08:00
Russ Anderson	e8929c8a6a	x86: uv: Add UV memory protection bios call Add UV bios call to change memory protections. Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-05 20:30:20 -08:00
Russ Anderson	64ccf2f9a7	x86: uv: Add UV watchlist bios call Add UV bios calls to allocate and free watchlists. Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-05 20:30:15 -08:00
Uros Bizjak	838e8bb71d	x86: Implement change_bit with immediate operand as "lock xorb" Impact: Minor optimization. Implement change_bit with immediate bit count as "lock xorb". This is similar to "lock orb" and "lock andb" for set_bit and clear_bit functions. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-05 09:51:02 -08:00
Ingo Molnar	9fcd18c9e6	sched: re-tune balancing Impact: improve wakeup affinity on NUMA systems, tweak SMP systems Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain balancing defaults on NUMA and SMP systems. Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason why we would not want to have wakeup affinity across nodes as well. (we already do this in the standard NUMA template.) lat_ctx on a NUMA box is particularly happy about this change: before: \| phoenix:~/l> ./lat_ctx -s 0 2 \| "size=0k ovr=2.60 \| 2 5.70 after: \| phoenix:~/l> ./lat_ctx -s 0 2 \| "size=0k ovr=2.65 \| 2 2.07 a 2.75x speedup. pipe-test is similarly happy about it too: \| phoenix:~/sched-tests> ./pipe-test \| 18.26 usecs/loop. \| 14.70 usecs/loop. \| 14.38 usecs/loop. \| 10.55 usecs/loop. # +WAKE_AFFINE on domain0+domain1 \| 8.63 usecs/loop. \| 8.59 usecs/loop. \| 9.03 usecs/loop. \| 8.94 usecs/loop. \| 8.96 usecs/loop. \| 8.63 usecs/loop. Also: - disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings) - enable SD_WAKE_BALANCE on SMP domains Sysbench+postgresql improves all around the board, quite significantly: .28-rc3-11474e2c .28-rc3-11474e2c-tune ------------------------------------------------- 1: 571 688 +17.08% 2: 1236 1206 -2.55% 4: 2381 2642 +9.89% 8: 4958 5164 +3.99% 16: 9580 9574 -0.07% 32: 7128 8118 +12.20% 64: 7342 8266 +11.18% 128: 7342 8064 +8.95% 256: 7519 7884 +4.62% 512: 7350 7731 +4.93% ------------------------------------------------- SUM: 55412 59341 +6.62% So it's a win both for the runup portion, the peak area and the tail. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-05 18:04:38 +01:00
Alok Kataria	fd8cd7e191	x86: vmware: look for DMI string in the product serial key Impact: Should permit VMware detection on older platforms where the vendor is changed. Could theoretically cause a regression if some weird serial number scheme contains the string "VMware" by pure chance. Seems unlikely, especially with the mixed case. In some user configured cases, VMware may choose not to put a VMware specific DMI string, but the product serial key is always there and is VMware specific. Add a interface to check the serial key, when checking for VMware in the DMI information. Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-04 13:59:00 -08:00
Ingo Molnar	6cf87efbc7	x86 debug: mark early_printk.o as notrace Impact: do not do function-tracing in the early-printk code this is useful when earlyprintk=vga,keep is used to debug tracer plugins. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-04 10:42:23 +01:00
Hiroshi Shimamoto	124ffe1456	x86: signal_64: remove unused code in __setup_rt_frame() Impact: cleanup sizeof(*set) is always 8 on x86_64. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-04 10:00:29 +01:00
Alok Kataria	70de9a9704	x86: don't use tsc_khz to calculate lpj if notsc is passed Impact: fix udelay when "notsc" boot parameter is passed With notsc passed on commandline, tsc may not be used for udelays, make sure that we do not use tsc_khz to calculate the lpj value in such cases. Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-04 09:55:26 +01:00
Alok Kataria	6bdbfe9991	x86: VMware: Fix vmware_get_tsc code Impact: Fix possible failure to calibrate the TSC on Vmware near 4 GHz The current version of the code to get the tsc frequency from the VMware hypervisor, will be broken on processor with frequency (4G-1) HZ, because on such processors eax will have UINT_MAX and that would be legitimate. We instead check that EBX did change to decide if we were able to read the frequency from the hypervisor. Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-03 11:35:57 -08:00
Linus Torvalds	da4a22cba7	Merge branch 'io-mappings-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'io-mappings-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: io mapping: clean up #ifdefs io mapping: improve documentation i915: use io-mapping interfaces instead of a variety of mapping kludges resources: add io-mapping functions to dynamically map large device apertures x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps	2008-11-03 10:15:40 -08:00
Keith Packard	e5beae1690	io mapping: clean up #ifdefs Impact: cleanup clean up ifdefs: change #ifdef CONFIG_X86_32/64 to CONFIG_HAVE_ATOMIC_IOMAP. flip around the #ifdef sections to clean up the structure. Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-03 18:21:45 +01:00
Steven Rostedt	7e5e26a3d8	ftrace: fix hardirq header for non ftrace archs Impact: build fix for non-ftrace architectures Not all archs implement ftrace, and therefore do not have an asm/ftrace.h. This patch corrects the problem. The ftrace_nmi_enter/exit now must be defined for all archs that implement dynamic ftrace. Currently, only x86 does. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-03 11:03:43 +01:00
James Bottomley	73557af5bf	x86, voyager: fix smp_intr_init() compile breakage Impact: fix x86/Voyager build Looks like this became static on the rest of x86. Fix it up by adding an external definition to mach-voyager/setup.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-03 10:52:21 +01:00
Ingo Molnar	7a895f53cd	Merge branches 'tracing/ftrace', 'tracing/markers', 'tracing/mmiotrace', 'tracing/nmisafe', 'tracing/tracepoints' and 'tracing/urgent' into tracing/core	2008-11-03 10:34:23 +01:00
Gary Hade	3555105333	x86: add memory hotremove config option Impact: enable CONFIG_MEMORY_HOTREMOVE feature on x86. (default-off) Memory hotremove functionality can currently be configured into the ia64, powerpc, and s390 kernels. This patch makes it possible to configure the memory hotremove functionality into the x86 kernel as well. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-03 10:03:38 +01:00
Alok Kataria	395628ef4e	x86: Skip verification by the watchdog for TSC clocksource. Impact: Changes timekeeping on Vmware (or with tsc=reliable). This is achieved by resetting the CLOCKSOURCE_MUST_VERIFY flag. We add a tsc=reliable commandline option to enable this. This enables legacy hardware without HPET, LAPIC, or ACPI timers to enter high-resolution timer mode. Along with that have extended this to be used in virtualization environement too. Now we also set this flag if the X86_FEATURE_TSC_RELIABLE bit is set. This is important since there is a wrap-around problem with the acpi_pm timer. The acpi_pm counter is just 24bits and this can overflow in ~4 seconds. With the NO_HZ kernels in virtualized environment, there can be situations when the guest is descheduled for longer duration, as a result we may miss the wrap of the acpi counter. When TSC is used as a clocksource and acpi_pm timer is being used as the watchdog clocksource this error in acpi_pm results in TSC being marked as unstable, and essentially results in time dropping in chunks of 4 seconds whenever this wrap is missed. Since the virtualized TSC is reliable on VMware, we should always use the TSCs clocksource on VMware, so we skip the verfication at runtime, by checking for the feature bit. Since we reset the flag for mgeode systems too, i have combined the mgeode case with the feature bit check. Signed-off-by: Jeff Hansen <jhansen@cardaccess-inc.com> Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Dan Hecht <dhecht@vmware.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-01 18:59:03 -07:00
Alok Kataria	eca0cd028b	x86: Add a synthetic TSC_RELIABLE feature bit. Impact: Changes timebase calibration on Vmware. Use the synthetic TSC_RELIABLE bit to workaround virtualization anomalies. Virtual TSCs can be kept nearly in sync, but because the virtual TSC offset is set by software, it's not perfect. So, the TSC synchronization test can fail. Even then the TSC can be used as a clocksource since the VMware platform exports a reliable TSC to the guest for timekeeping purposes. Use this bit to check if we need to skip the TSC sync checks. Along with this also set the CONSTANT_TSC bit when on VMware, since we still want to use TSC as clocksource on VM running over hardware which has unsynchronized TSC's (opteron's), since the hypervisor will take care of providing consistent TSC to the guest. Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Dan Hecht <dhecht@vmware.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-01 18:58:01 -07:00
Alok Kataria	88b094fb8d	x86: Hypervisor detection and get tsc_freq from hypervisor Impact: Changes timebase calibration on Vmware. v3->v2 : Abstract the hypervisor detection and feature (tsc_freq) request behind a hypervisor.c file v2->v1 : Add a x86_hyper_vendor field to the cpuinfo_x86 structure. This avoids multiple calls to the hypervisor detection function. This patch adds function to detect if we are running under VMware. The current way to check if we are on VMware is following, # check if "hypervisor present bit" is set, if so read the 0x40000000 cpuid leaf and check for "VMwareVMware" signature. # if the above fails, check the DMI vendors name for "VMware" string if we find one we query the VMware hypervisor port to check if we are under VMware. The DMI + "VMware hypervisor port check" is needed for older VMware products, which don't implement the hypervisor signature cpuid leaf. Also note that since we are checking for the DMI signature the hypervisor port should never be accessed on native hardware. This patch also adds a hypervisor_get_tsc_freq function, instead of calibrating the frequency which can be error prone in virtualized environment, we ask the hypervisor for it. We get the frequency from the hypervisor by accessing the hypervisor port if we are running on VMware. Other hypervisors too can add code to the generic routine to get frequency on their platform. Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Dan Hecht <dhecht@vmware.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-01 18:57:08 -07:00
Alok Kataria	49ab56ac6e	x86: add X86_FEATURE_HYPERVISOR feature bit Impact: Number declaration only. Add X86_FEATURE_HYPERVISOR bit (CPUID level 1, ECX, bit 31). Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-11-01 18:45:37 -07:00
Linus Torvalds	67d1128425	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix AMDC1E and XTOPOLOGY conflict in cpufeature x86: build fix	2008-11-01 10:36:30 -07:00
Linus Torvalds	1f98757776	x86: Clean up late e820 resource allocation This makes the late e820 resources use 'insert_resource_expand_to_fit()' instead of doing a 'reserve_region_with_split()', and also avoids marking them as IORESOURCE_BUSY. This results in us being perfectly happy to use pre-existing PCI resources even if they were marked as being in a reserved region, while still avoiding any _new_ allocations in the reserved regions. It also makes for a simpler and more accurate resource tree. Example resource allocation from Jonathan Corbet, who has firmware that has an e820 reserved entry that covered a big range (e0000000-fed003ff), and that had various PCI resources in it set up by firmware. With old kernels, the reserved range would force us to re-allocate all pre-existing PCI resources, and his reserved range would end up looking like this: e0000000-fed003ff : reserved fec00000-fec00fff : IOAPIC 0 fed00000-fed003ff : HPET 0 where only the pre-allocated special regions (IOAPIC and HPET) were kept around. With 2.6.28-rc2, which uses 'reserve_region_with_split()', Jonathan's resource tree looked like this: e0000000-fe7fffff : reserved fe800000-fe8fffff : PCI Bus 0000:01 fe800000-fe8fffff : reserved fe900000-fe9d9aff : reserved fe9d9b00-fe9d9bff : 0000:00:1f.3 fe9d9b00-fe9d9bff : reserved fe9d9c00-fe9d9fff : 0000:00:1a.7 fe9d9c00-fe9d9fff : reserved fe9da000-fe9dafff : 0000:00:03.3 fe9da000-fe9dafff : reserved fe9db000-fe9dbfff : 0000:00:19.0 fe9db000-fe9dbfff : reserved fe9dc000-fe9dffff : 0000:00:1b.0 fe9dc000-fe9dffff : reserved fe9e0000-fe9fffff : 0000:00:19.0 fe9e0000-fe9fffff : reserved fea00000-fea7ffff : 0000:00:02.0 fea00000-fea7ffff : reserved fea80000-feafffff : 0000:00:02.1 fea80000-feafffff : reserved feb00000-febfffff : 0000:00:02.0 feb00000-febfffff : reserved fec00000-fed003ff : reserved fec00000-fec00fff : IOAPIC 0 fed00000-fed003ff : HPET 0 and because the reserved entry had been split and moved into the individual resources, and because it used the IORESOURCE_BUSY flag, the drivers that actually wanted to _use_ those resources couldn't actually attach to them: e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfe9e0000-0xfe9fffff] HDA Intel 0000:00:1b.0: BAR 0: can't reserve mem region [0xfe9dc000-0xfe9dffff] with this patch, the resource tree instead becomes e0000000-fed003ff : reserved fe800000-fe8fffff : PCI Bus 0000:01 fe9d9b00-fe9d9bff : 0000:00:1f.3 fe9d9c00-fe9d9fff : 0000:00:1a.7 fe9d9c00-fe9d9fff : ehci_hcd fe9da000-fe9dafff : 0000:00:03.3 fe9db000-fe9dbfff : 0000:00:19.0 fe9db000-fe9dbfff : e1000e fe9dc000-fe9dffff : 0000:00:1b.0 fe9dc000-fe9dffff : ICH HD audio fe9e0000-fe9fffff : 0000:00:19.0 fe9e0000-fe9fffff : e1000e fea00000-fea7ffff : 0000:00:02.0 fea80000-feafffff : 0000:00:02.1 feb00000-febfffff : 0000:00:02.0 fec00000-fec00fff : IOAPIC 0 fed00000-fed003ff : HPET 0 ie the one reserved region now ends up surrounding all the PCI resources that were allocated inside of it by firmware, and because it is not marked BUSY, drivers have no problem attaching to the pre-allocated resources. Reported-and-tested-by: Jonathan Corbet <corbet@lwn.net> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-01 10:17:22 -07:00
Alok Kataria	b2bcc7b299	x86: add a synthetic TSC_RELIABLE feature bit Impact: None, bit reservation only Add a synthetic TSC_RELIABLE feature bit which will be used to mark TSC as reliable so that we could skip all the runtime checks for TSC stablity, which have false positives in virtual environment. Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Dan Hecht <dhecht@vmware.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-10-31 14:44:19 -07:00
Zhaolei	a376f30a95	x86: avoid duplicate running of pud_offset and pmd_offset in one_md_table_init() Impact: simplify implementation, cleanup If !(pgd_val(*pgd) & _PAGE_PRESENT) in PAE mode, we need not get value of pmd_table again. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-31 11:03:17 +01:00
Venki Pallipadi	2576c99917	x86: fix AMDC1E and XTOPOLOGY conflict in cpufeature Impact: fix xsave slowdown regression Fix two features from conflicting in feature bits. Fixes this performance regression: Subject: cpu2000(both float and int) 13% regression with 2.6.28-rc1 http://lkml.org/lkml/2008/10/28/36 Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Bisected-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-31 11:01:40 +01:00
Steven Rostedt	a26a2a2739	ftrace: nmi safe code clean ups Impact: cleanup This patch cleans up the NMI safe code for dynamic ftrace as suggested by Andrew Morton. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-31 10:29:17 +01:00

... 3 4 5 6 7 ...

5341 Commits