o A recent change to vmlinux.ld.S file broke kexec as now resulting vmlinux
program headers are overlapping in physical address space.
o Now all the vsyscall related sections are placed after data and after
that mostly init data sections are placed. To avoid physical overlap
among phdrs, there are three possible solutions.
- Place vsyscall sections also in data phdrs instead of user
- move vsyscal sections after init data in bss.
- create another phdrs say data.init and move all the sections
after vsyscall into this new phdr.
o This patch implements the third solution.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Magnus Damm <magnus@valinux.co.jp>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
TARGET_CPUS is the default irq routing poicy. It specifies which cpus the
kernel should aim an irq at. In physflat delivery mode we can route an irq to
a single cpu. But that doesn't mean our default policy should only be a
single cpu is allowed.
By allowing the irq routing code to select from multiple cpus this enables
systems with more irqs then we can service on a single processor to actually
work.
I just audited and tested the code and irqbalance doesn't care, and the
io_apic.c doesn't care if we have extra cpus in the mask. Everything will use
or assume we are using the lowest numbered cpu in the mask if we can't use
them all.
So this should result in no behavior changes except on systems that need it.
Thanks for YH Lu for spotting this problem in his testing.
Cc: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Jan convinced me that it was unnecessary because the assembly stubs do
this already on the stack.
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Thanks to YH Lu for spotting this. It appears I missed this function when I
refactored allocate_irq_vector and introduced irq_domain, with the result that
all retriggered irqs would go to cpu 0 even if we were not prepared to receive
them there.
While reviewing YH's patch I also noticed that this function was missing
locking, and since I am now reading two values from two diffrent arrays that
looks like a race we might be able to hit in the real world.
Cc: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch:
- out of range system calls failing to return -ENOSYS under
system call tracing
[AK: split out from another patch by Jan as separate bugfix]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The fake return address was being set to __KERNEL_PDA, rather than 0.
Push it earlier while %eax still equals 0.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Andrew Morton <akpm@osdl.org>
If function change_page_attr_addr calls revert_page to revert
to original pte value, mk_pte_phys does not mask NX bit. If NX bit
is set on no NX hardware supported x86_64 machine, there is will
be RSVD type page fault and system will crash. This patch adds NX
mask bit for PTE entry.
Signed-off-by: bibo,mao <bibo.mao@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This changes the dwarf2 unwinder to do a binary search for CIEs
instead of a linear work. The linker is unfortunately not
able to build a proper lookup table at link time, instead it creates
one at runtime as soon as the bootmem allocator is usable (so you'll continue
using the linear lookup for the first [hopefully] few calls).
The code should be ready to utilize a build-time created table once
a fixed linker becomes available.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This avoids some problems with gcc 4.x and earlier generating
invalid unwind information. In 4.1 the option is default
when unwind information is enabled.
And it seems to generate smaller code too, so it's probably
a good thing on its own. With gcc 4.0:
i386:
4683198 902112 480868 6066178 5c9002 vmlinux (before)
4449895 902112 480868 5832875 5900ab vmlinux (after)
x86-64:
4939761 1449584 648216 7037561 6b6279 vmlinux (before)
4854193 1449584 648216 6951993 6a1439 vmlinux (after)
On 4.1 it shouldn't make much difference because it is
default when unwind is enabled anyways.
Suggested by Michael Matz and Jan Beulich
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Currently some code pieces assume that address returned by find_e820_area()
are page aligned. But looks like find_e820_area() had no such intention
and hence one might end up stomping over some of the data. One such case
is bootmem allocator initialization code stomped over bss.
This patch modified find_e820_area() to return page aligned address. This
might be little wasteful of memory but at the same time probably it is
easier to handle page aligned memory.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
The arch/x86_64/pci directory was giving problems in a wierd cross-compile
environment. The exact cause is unknown, but the Makefile used CFLAGS
instead of EXTRA_CFLAGS. From what I can tell from
Documentation/kbuild/makefiles.txt, CFLAGS should not be used for this, it
should be EXTRA_CFLAGS. And it solves the cross-compile problem.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
This was copied, pasted but not edited.
Cc: Andi Kleen <ak@muc.de>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch corrects the logic used in srat.c to figure out what
parsing what action to take when registering hot-add areas. Hot-add
areas should only be added to the node information for the
MEMORY_HOTPLUG_RESERVE case. When booting MEMORY_HOTPLUG_SPARSE hot-add
areas on everything but the last node are getting include in the node
data and during kernel boot the pages are setup then the kernel dies
when the pages are used. This patch fixes this issue.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This was apparently missed by the move to the generic IRQ code.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
check_perm() does not drop the reference to the module when kzalloc()
failure occurs.
Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The loop within ocfs2_zero_extend() can execute for a long time, causing
spurious soft lockup warnings.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The page zeroing code was missing the region between old i_size and new
i_size for those extends that didn't actually require a change in space
allocation.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This was causing some folks to incorrectly get -EBUSY during rename.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch deletes redundant memcmp() while looking up in rb tree.
Signed-off-by: Akinbou Mita <akinobu.mita@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This reverts commit 4596c75c23 as
requested by Olaf Hering. It causes compile errors, and says Olaf:
"This change is also wrong, the autoloading works perfect with 2.6.18,
no need to add random PCI ids.
See commit a0245f7ad5, platform devices
have now a modalias entry in sysfs. The network card is not a PCI
device."
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (36 commits)
[Bluetooth] Fix HID disconnect NULL pointer dereference
[Bluetooth] Add missing entry for Nokia DTL-4 PCMCIA card
[Bluetooth] Add support for newer ANYCOM USB dongles
[NET]: Can use __get_cpu_var() instead of per_cpu() in loopback driver.
[IPV4] inet_peer: Group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS
[TCP]: One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err()
[NETFILTER]: Missing check for CAP_NET_ADMIN in iptables compat layer
[NETPOLL]: initialize skb for UDP
[IPV6]: Fix route.c warnings when multiple tables are disabled.
[TG3]: Bump driver version and release date.
[TG3]: Add lower bound checks for tx ring size.
[TG3]: Fix set ring params tx ring size implementation
[NET]: reduce per cpu ram used for loopback stats
[IPv6] route: Fix prohibit and blackhole routing decision
[DECNET]: Fix input routing bug
[TCP]: Bound TSO defer time
[IPv4] fib: Remove unused fib_config members
[IPV6]: Always copy rt->u.dst.error when copying a rt6_info.
[IPV6]: Make IPV6_SUBTREES depend on IPV6_MULTIPLE_TABLES.
[IPV6]: Clean up BACKTRACK().
...
Fix one more compile breakage caused by the post -rc1 IRQ changes.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is suitable for just about any 2.6 kernel. It should go in
2.6.19 and 2.6.18.2 and possible even the .17 and .16 stable series.
This is a long standing bug that seems to have only recently become
apparent, presumably due to increasing use of NFS over TCP - many
distros seem to be making it the default.
The SK_CONN bit gets set when a listening socket may be ready
for an accept, just as SK_DATA is set when data may be available.
It is entirely possible for svc_tcp_accept to be called with neither
of these set. It doesn't happen often but there is a small race in
svc_sock_enqueue as SK_CONN and SK_DATA are tested outside the
spin_lock. They could be cleared immediately after the test and
before the lock is gained.
This normally shouldn't be a problem. The sockets are non-blocking so
trying to read() or accept() when ther is nothing to do is not a problem.
However: svc_tcp_recvfrom makes the decision "Should I accept() or
should I read()" based on whether SK_CONN is set or not. This usually
works but is not safe. The decision should be based on whether it is
a TCP_LISTEN socket or a TCP_CONNECTED socket.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: <stable@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A disk generated some I/O error, after it, I hitted
J_ASSERT(transaction->t_updates > 0) in journal_stop().
It seems to happened on ext3_truncate() path from stack trace. Then,
maybe the following case may trigger J_ASSERT(transaction->t_updates > 0).
ext3_truncate()
-> ext3_free_branches()
-> ext3_journal_test_restart()
-> ext3_journal_restart()
-> journal_restart()
transaction->t_updates--;
/* another process aborted journal */
-> start_this_handle()
returns -EROFS without transaction->t_updates++;
-> ext3_journal_stop()
-> journal_stop()
J_ASSERT(transaction->t_updates > 0)
If journal was aborted in middle of journal_restart(), ext3_truncate()
may trigger J_ASSERT().
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a size check in smi_data_write to prevent possible wrapping problems
with large pos values when calling smi_data_buf_realloc on 32-bit.
Signed-off-by: Doug Warzecha <Douglas_Warzecha@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These returns should be negative, like the others in this function.
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
--=-=-=
from mm/memory.c:
1434 static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va)
1435 {
1436 /*
1437 * If the source page was a PFN mapping, we don't have
1438 * a "struct page" for it. We do a best-effort copy by
1439 * just copying from the original user address. If that
1440 * fails, we just zero-fill it. Live with it.
1441 */
1442 if (unlikely(!src)) {
1443 void *kaddr = kmap_atomic(dst, KM_USER0);
1444 void __user *uaddr = (void __user *)(va & PAGE_MASK);
1445
1446 /*
1447 * This really shouldn't fail, because the page is there
1448 * in the page tables. But it might just be unreadable,
1449 * in which case we just give up and fill the result with
1450 * zeroes.
1451 */
1452 if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
1453 memset(kaddr, 0, PAGE_SIZE);
1454 kunmap_atomic(kaddr, KM_USER0);
#### D-cache have to be flushed here.
#### It seems it is just forgotten.
1455 return;
1456
1457 }
1458 copy_user_highpage(dst, src, va);
#### Ok here. flush_dcache_page() called from this func if arch need it
1459 }
Following is the patch fix this issue:
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Acked-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Qooting Adrian:
- net/sunrpc/svc.c uses highest_possible_node_id()
- include/linux/nodemask.h says highest_possible_node_id() is
out-of-line #if MAX_NUMNODES > 1
- the out-of-line highest_possible_node_id() is in lib/cpumask.c
- lib/Makefile: lib-$(CONFIG_SMP) += cpumask.o
CONFIG_ARCH_DISCONTIGMEM_ENABLE=y, CONFIG_SMP=n, CONFIG_SUNRPC=y
-> highest_possible_node_id() is used in net/sunrpc/svc.c
CONFIG_NODES_SHIFT defined and > 0
-> include/linux/numa.h: MAX_NUMNODES > 1
-> compile error
The bug is not present on architectures where ARCH_DISCONTIGMEM_ENABLE
depends on NUMA (but m32r isn't the only affected architecture).
So move the function into page_alloc.c
Cc: Adrian Bunk <bunk@stusta.de>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Interrupts must be disabled during alternative instruction patching. On
systems with high timer IRQ rates, or when running in an emulator, timing
differences can result in random kernel panics because of running partially
patched instructions. This doesn't yet fix NMIs, which requires extricating
the patch code from the late bug checking and is logically separate (and also
less likely to cause problems).
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We are using NFS_REPLAY_ME as a special error value that is never leaked to
clients. That works fine; the only problem is mixing host- and network-
endian values in the same objects. Network-endian equivalent would work just
as fine; switch to it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
don't use the same variable to store NFS and host error values
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
don't use the same variable to store NFS and host error values
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>