Do not use IRQF_SHARED, these interrupt numbers should all
be unique.
Also use name strings without spaces in them just like
PCI controller drivers do, for consistency.
Signed-off-by: David S. Miller <davem@davemloft.net>
The idea is to move more and more things into the pbm,
with the eventual goal of eliminating the pci_controller_info
entirely as there really isn't any need for it.
This stage of the transformations requires some reworking of
the PCI error interrupt handling.
It might be tricky to get rid of the pci_controller_info parenting for
a few reasons:
1) When we get an uncorrectable or correctable error we want
to interrogate the IOMMU and streaming cache of both
PBMs for error status. These errors come from the UPA
front-end which is shared between the two PBM PCI bus
segments.
Historically speaking this is why I choose the datastructure
hierarchy of pci_controller_info-->pci_pbm_info
2) The probing does a portid/devhandle match to look for the
'other' pbm, but this is entirely an artifact and can be
eliminated trivially.
What we could do to solve #1 is to have a "buddy" pointer from one pbm
to another.
Signed-off-by: David S. Miller <davem@davemloft.net>
Namely bus-range and ino-bitmap.
This allows us also to eliminate pci_controller_info's
pci_{first,last}_busno fields as only the pbm ones are
used now.
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement utimensat(2) which is an extension to futimesat(2) in that it
a) supports nano-second resolution for the timestamps
b) allows to selectively ignore the atime/mtime value
c) allows to selectively use the current time for either atime or mtime
d) supports changing the atime/mtime of a symlink itself along the lines
of the BSD lutimes(3) functions
For this change the internally used do_utimes() functions was changed to
accept a timespec time value and an additional flags parameter.
Additionally the sys_utime function was changed to match compat_sys_utime
which already use do_utimes instead of duplicating the work.
Also, the completely missing futimensat() functionality is added. We have
such a function in glibc but we have to resort to using /proc/self/fd/* which
not everybody likes (chroot etc).
Test application (the syscall number will need per-arch editing):
#include <errno.h>
#include <fcntl.h>
#include <time.h>
#include <sys/time.h>
#include <stddef.h>
#include <syscall.h>
#define __NR_utimensat 280
#define UTIME_NOW ((1l << 30) - 1l)
#define UTIME_OMIT ((1l << 30) - 2l)
int
main(void)
{
int status = 0;
int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666);
if (fd == -1)
error (1, errno, "failed to create test file \"ttt\"");
struct stat64 st1;
if (fstat64 (fd, &st1) != 0)
error (1, errno, "fstat failed");
struct timespec t[2];
t[0].tv_sec = 0;
t[0].tv_nsec = 0;
t[1].tv_sec = 0;
t[1].tv_nsec = 0;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");
struct stat64 st2;
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
{
puts ("atim not reset to zero");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim not reset to zero");
status = 1;
}
if (status != 0)
goto out;
t[0] = st1.st_atim;
t[1].tv_sec = 0;
t[1].tv_nsec = UTIME_OMIT;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
|| st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
{
puts ("atim not set");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim changed from zero");
status = 1;
}
if (status != 0)
goto out;
t[0].tv_sec = 0;
t[0].tv_nsec = UTIME_OMIT;
t[1] = st1.st_mtim;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
|| st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
{
puts ("mtim changed from original time");
status = 1;
}
if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec
|| st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec)
{
puts ("mtim not set");
status = 1;
}
if (status != 0)
goto out;
sleep (2);
t[0].tv_sec = 0;
t[0].tv_nsec = UTIME_NOW;
t[1].tv_sec = 0;
t[1].tv_nsec = UTIME_NOW;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
struct timeval tv;
gettimeofday(&tv,NULL);
if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec
|| st2.st_atim.tv_sec > tv.tv_sec)
{
puts ("atim not set to NOW");
status = 1;
}
if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec
|| st2.st_mtim.tv_sec > tv.tv_sec)
{
puts ("mtim not set to NOW");
status = 1;
}
if (symlink ("ttt", "tttsym") != 0)
error (1, errno, "cannot create symlink");
t[0].tv_sec = 0;
t[0].tv_nsec = 0;
t[1].tv_sec = 0;
t[1].tv_nsec = 0;
if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0)
error (1, errno, "utimensat failed");
if (lstat64 ("tttsym", &st2) != 0)
error (1, errno, "lstat failed");
if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
{
puts ("symlink atim not reset to zero");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("symlink mtim not reset to zero");
status = 1;
}
if (status != 0)
goto out;
t[0].tv_sec = 1;
t[0].tv_nsec = 0;
t[1].tv_sec = 1;
t[1].tv_nsec = 0;
if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0)
error (1, errno, "utimensat failed");
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0)
{
puts ("atim not reset to one");
status = 1;
}
if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim not reset to one");
status = 1;
}
if (status == 0)
puts ("all OK");
out:
close (fd);
unlink ("ttt");
unlink ("tttsym");
return status;
}
[akpm@linux-foundation.org: add missing i386 syscall table entry]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)
arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SERIAL] sunsu: Fix section mismatch warnings.
[SPARC64]: pgtable_cache_init() should be __init.
[SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/prom.c
[SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/pci.c
[SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/console.c
[MM]: sparse_init() should be __init.
[SPARC64]: Update defconfig.
[VIDEO]: Add Sun XVR-2500 framebuffer driver.
[VIDEO]: Add Sun XVR-500 framebuffer driver.
[SPARC64]: SUN4U PCI-E controller support.
[SPARC]: Fix comment typo in smp4m_blackbox_current().
[SCSI] SUNESP: sun_esp.c needs linux/delay.h
Fix up conflict in arch/sparc64/mm/init.c manually due to removal of
pgtable_cache_init() through the -mm patches (even though that patch was
also by David ;)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Handle MAP_FIXED in hugetlb_get_unmapped_area on sparc64 by just using
prepare_hugepage_range()
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is not necessary to tell the slab allocators to align to a cacheline
if an explicit alignment was already specified. It is rather confusing
to specify multiple alignments.
Make sure that the call sites only use one form of alignment.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch was recently posted to lkml and acked by Pekka.
The flag SLAB_MUST_HWCACHE_ALIGN is
1. Never checked by SLAB at all.
2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB
3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.
The only remaining use is in sparc64 and ppc64 and their use there
reflects some earlier role that the slab flag once may have had. If
its specified then SLAB_HWCACHE_ALIGN is also specified.
The flag is confusing, inconsistent and has no purpose.
Remove it.
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I ported this to sparc64 as per the patch below, tested on UP SunBlade1500 and
24 cpu Niagara T1000.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
apb_calc_first_last(), apb_fake_ranges(), pci_of_scan_bus(),
of_scan_pci_bridge(), pci_of_scan_bus(), and pci_scan_one_pbm()
should all be __devinit.
Signed-off-by: David S. Miller <davem@davemloft.net>
Some minor refactoring in the generic code was necessary for
this:
1) This controller requires 8-byte access to the interrupt map
and clear register. They are 64-bits on all the other
SBUS and PCI controllers anyways, so this was easy to cure.
2) The IMAP register has a different layout and some bits that we
need to preserve, so use a read/modify/write when making
changes to the IMAP register in generic code.
3) Flushing the entire IOMMU TLB is best done with a single write
to a register on this PCI controller, add a iommu->iommu_flushinv
for this.
Still lacks MSI support, that will come later.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits)
[PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall
[PATCH] i386: type may be unused
[PATCH] i386: Some additional chipset register values validation.
[PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split.
[PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff
[PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu
[PATCH] i386: white space fixes in i387.h
[PATCH] i386: Drop noisy e820 debugging printks
[PATCH] x86-64: Fix allnoconfig error in genapic_flat.c
[PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems
[PATCH] x86-64: Share identical video.S between i386 and x86-64
[PATCH] x86-64: Remove CONFIG_REORDER
[PATCH] x86-64: Print type and size correctly for unknown compat ioctls
[PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0)
[PATCH] i386: Little cleanups in smpboot.c
[PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning
[PATCH] x86: Use RDTSCP for synchronous get_cycles if possible
[PATCH] i386: Add X86_FEATURE_RDTSCP
[PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
[PATCH] i386: Implement alternative_io for i386
...
Fix up trivial conflict in include/linux/highmem.h manually.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (49 commits)
[SCTP]: Set assoc_id correctly during INIT collision.
[SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv()
[SCTP]: Fix the SO_REUSEADDR handling to be similar to TCP.
[SCTP]: Verify all destination ports in sctp_connectx.
[XFRM] SPD info TLV aggregation
[XFRM] SAD info TLV aggregationx
[AF_RXRPC]: Sort out MTU handling.
[AF_IUCV/IUCV] : Add missing section annotations
[AF_IUCV]: Implementation of a skb backlog queue
[NETLINK]: Remove bogus BUG_ON
[IPV6]: Some cleanups in include/net/ipv6.h
[TCP]: zero out rx_opt in tcp_disconnect()
[BNX2]: Fix TSO problem with small MSS.
[NET]: Rework dev_base via list_head (v3)
[TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_start
[BNX2]: Update version and reldate.
[BNX2]: Print bus information for PCIE devices.
[BNX2]: Add 1-shot MSI handler for 5709.
[BNX2]: Restructure PHY event handling.
[BNX2]: Add indirect spinlock.
...
Cleanup of dev_base list use, with the aim to simplify making device
list per-namespace. In almost every occasion, use of dev_base variable
and dev->next pointer could be easily replaced by for_each_netdev
loop. A few most complicated places were converted to using
first_netdev()/next_netdev().
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
set_irq_msi() currently connects an irq_desc to an msi_desc. The archs call
it at some point in their setup routine, and then the generic code sets up the
reverse mapping from the msi_desc back to the irq.
set_irq_msi() should do both connections, making it the one and only call
required to connect an irq with it's MSI desc and vice versa.
The arch code MUST call set_irq_msi(), and it must do so only once it's sure
it's not going to fail the irq allocation.
Given that there's no need for the arch to return the irq anymore, the return
value from the arch setup routine just becomes 0 for success and anything else
for failure.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Allows architectures to advertise that they support MSI rather than listing
each architecture as a PCI_MSI dependency.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Let's allow page-alignment in general for per-cpu data (wanted by Xen, and
Ingo suggested KVM as well).
Because larger alignments can use more room, we increase the max per-cpu
memory to 64k rather than 32k: it's getting a little tight.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This starts bringing the PowerPC and Sparc64 implemetations back closer
together.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It should be set to the total number of pages that the
system will really have available after things like
initmem, the bootmem map, and initrd are freed up.
Signed-off-by: David S. Miller <davem@davemloft.net>
While useful in odd circumstances to debug something, they are
normally totally unused and anyone can fetch this code out of the
history if they really need it.
And in any event, the person who needs this kind of code is usually me
:-)
Signed-off-by: David S. Miller <davem@davemloft.net>
__get_phys is only called from init.c as is prom_virt_to_phys(),
__get_iospace() is not called at all, and sun4u_get_pte() is largely
misnamed.
Privatize the implementation and helper functions of
sun4u_get_phys() to mm/init.c, and rename to
kvaddr_to_paddr().
The only used of this thing is flush_icache_range(), and thus
things can be considerably further simplified. For example,
we should only see module or PAGE_OFFSET kernel addresses here,
so we don't need the OBP firmware range handling at all.
Signed-off-by: David S. Miller <davem@davemloft.net>
Kick out empty entries as soon as we spot them, and use memmove()
instead of a silly loop to make the operation more clear.
Signed-off-by: David S. Miller <davem@davemloft.net>
Decrease the SECTION_SIZE_BITS --> MAX_PHYSADDR_BITS
range a little bit.
The cost of going to SPARSEMEM_STATIC becomes 8K of BSS space, and in
return we save a pointer dereferences on every page struct lookup.
Even better we hit the main kernel image for the base address which is
in a hugepage locked TLB entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
This helps deal with the invisible bridge that sits between
the host controller and the top-most visisble PCI devices
on hypervisor systems.
For example, on T1000 the bus-range property says 2 --> 4
and so there is a PCI express bridge at bus 2, devfn 0, etc.
So if we don't force the dummy host controller to bus zero,
we'll try to create two devices with the same domain/bus/devfn
triplet.
Also, add some more log diagnostics to make debugging stuff like this
easyer.
Signed-off-by: David S. Miller <davem@davemloft.net>
We fake up a dummy one in all cases because that is the simplest
thing to do and it happens to be necessary for hypervisor systems.
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't do the "Simba APB is a PBM" bogosity for Sabre
controllers any longer, so this pbms_same_domain thing
is no longer necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
The SIMBA APB bridge is strange, it is a PCI bridge but it lacks
some standard OF properties, in particular it lacks a 'ranges'
property.
What you have to do is read the IO and MEM range registers in
the APB bridge to determine the ranges handled by each bridge.
So fill in the bus resources by doing that.
Since we now handle this quirk in the generic PCI and OF device
probing layers, we can flat out eliminate all of that code from
the sabre pci controller driver.
In fact we can thus eliminate completely another quirk of the sabre
driver. It tried to make the two APB bridges look like PBMs but that
makes zero sense now (and it's questionable whether it ever made sense).
So now just use pbm_A and probe the whole PCI hierarchy using that as
the root.
This simplification allows many future cleanups to occur.
Also, I've found yet another quirk that needs to be worked around
while testing this. You can't use the 'class-code' OF firmware
property, especially for IDE controllers. We have to read the value
out of PCI config space or else we'll see the value the device was
showing before it was programmed into native mode.
I'm starting to think it might be wise to just read all of the values
out of PCI config space instead of using the OF properties. :-/
Signed-off-by: David S. Miller <davem@davemloft.net>
Need to traverse recursively down child busses else we only
get the file created under devices at the top-level.
Signed-off-by: David S. Miller <davem@davemloft.net>
The only user was bus_dvma_to_mem() which is no longer used
by any driver, so kill that, and the export of pci_memspace_mask.
The only user now is the PCI mmap support code.
Signed-off-by: David S. Miller <davem@davemloft.net>
Almost entirely taken from the 64-bit PowerPC PCI code.
This allowed to eliminate a ton of cruft from the sparc64
PCI layer.
Signed-off-by: David S. Miller <davem@davemloft.net>
Also, do not try to compute resources by hand, instead use
the pre-computed ones in the of_device.
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows us to simplify sharing code with powerpc which
has properties that have various forms of capitalization
when on the sparc64 side the property is all lower-case.
Signed-off-by: David S. Miller <davem@davemloft.net>
Finally, we actually change the functions themselves.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Removes days_in_mo[], as it's almost identical to month_days[]
- Use the leapyear() macro
- Line length wrapping.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'd like to thank John Stul and others for helping
me along the way.
A lot of cleanups fell out of this. For example, the get_compare()
tick_op was totally unused, so was deleted. And the most often used
tick_op members were grouped together for cache-friendlyness.
The sparc64 TSC is given to the kernel as a one-shot timer.
tick_ops->init_timer() simply turns off the privileged bit in
the tick register (when possible), and disables the interrupt
by setting bit 63 in the compare register. The ->disable_irq()
op also sets this bit.
tick_ops->add_compare() is changed to:
1) Add the given delta to "tick" not to "compare"
2) Return a boolean which, if true, means that the tick
value read after writing the compare value was found
to have incremented past the initial tick value. This
mirrors logic used in the HPET driver's ->next_event()
method.
Each tick_ops implementation also now provides a name string.
And we feed this into the clocksource and clockevents layers.
Signed-off-by: David S. Miller <davem@davemloft.net>
Things were scattered all over the place, split between
SMP and non-SMP.
Unify it all so that dyntick support is easier to add.
Signed-off-by: David S. Miller <davem@davemloft.net>
While building a test kernel for the new esp driver (against
git-current), I hit this bug. Trivial fix, put the inline declaration
in the right place. :)
Signed-off-by: Tom "spot" Callaway <tcallawa@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not sign extend args using the sys32_ipc stub, that is
buggy and unnecessary.
Based upon an excellent report by Mikael Pettersson.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix section mismatch in arch/sparc/kernel/pcic.c and
arch/sparc64/kernel/pci.c.
Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I don't figure anyone really cares about SunOS syscall emulation, and I
certainly don't. But I'm getting rid of uses of the OPEN_MAX and CHILD_MAX
compile-time constant, and these are almost the only ones. OPEN_MAX is a
bogus constant with no meaning about anything. The RLIMIT_NOFILE resource
limit is what sysconf (_SC_OPEN_MAX) actually wants to return.
The CHILD_MAX cases weren't actually using anything I want to get rid of,
but I noticed that they are there and are wrong too. The CHILD_MAX value
is not really unlimited as a -1 return from sysconf indicates. The
RLIMIT_NPROC resource limit is what sysconf (_SC_CHILD_MAX) wants to return.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several IOMMU allocator bugs. Instead of trying to fix this
overly complicated code, just mirror the PCI IOMMU arena allocator
which is very stable and well stress tested.
I tried to make the code as identical as possible so we can switch
sun4u PCI and SBUS over to a common piece of IOMMU code. All that
will be need are two callbacks, one to do a full IOMMU flush and one
to do a streaming buffer flush.
This patch gets rid of a lot of hangs and mysterious crashes on SBUS
sparc64 systems, at least for me.
Signed-off-by: David S. Miller <davem@davemloft.net>
The manual says that it is required and we actually have crash reports
where loads see stale data due to not having membars here.
In one case the networking does:
memset(skb, 0, offsetof(struct sk_buff, truesize));
and then some code later checks skb->nohdr for zero, but it's still
the value that was there before the memset().
Note that arch/sparc64/lib/xor.S already got this right.
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to make sure to use base-pagesize TLB entries even during the
early transition period where we need TLB miss handling but don't have
the kernel page tables setup yet for the linear region.
Also, it is necessary therefore to not use the 4MB TSB for these
translations, and instead use the normal kernel TSB. This allows us
to also get rid of the 4MB tsb for debug builds which shrinks the
kernel a little bit.
Signed-off-by: David S. Miller <davem@davemloft.net>
These pte loops all assume the passed in address is HPAGE
aligned, make sure that is actually true.
Signed-off-by: David S. Miller <davem@davemloft.net>
sys_mbind
sys_get_mempolicy
sys_set_mempolicy
sys_kexec_load
sys_move_pages
sys_getcpu
sys_epoll_pwait
This work is largely a result of David Woodhouse's most
excellent missing syscalls patch.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix atomicity of TIF update in flush_thread() for sparc64
Fixes correctly the race by using *_ti_thread_flag.
Race :
parent process executing :
sys_ptrace()
(lock_kernel())
(ptrace_get_task_struct(pid))
arch_ptrace()
ptrace_detach()
ptrace_disable(child);
clear_singlestep(child);
clear_tsk_thread_flag(child, TIF_SINGLESTEP);
(which clears the TIF_SINGLESTEP flag atomically from a different
process)
(put_task_struct(child))
(unlock_kernel())
And at the same time, in the child process :
sys_execve()
do_execve()
search_binary_handler()
load_elf_binary()
flush_old_exec()
flush_thread()
doing a non-atomic thread flag update
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
We mistakedly modify 'bus' in the innermost loop. What
should happen is that at each register index iteration,
we start with the same 'bus'.
So preserve it's value at the top level, and use a loop
local variable 'dbus' for iteration.
This bug causes registers other than the first to be
decoded improperly.
Signed-off-by: David S. Miller <davem@davemloft.net>
When the PCI controller OBP node lacks an interrupt-map
and interrupt-map-mask property, we need to form the
INO by hand. The PCI swizzle logic was not doing that
properly.
This was a regression added by the of_device code.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts some bogosity from the dynamic command-line
changes made on sparc32 and sparc64.
Drivers such as drivers/sbus/char/openprom.c reference
saved_command_line, and can be modular.
The boot_command_line is __initdata, yet the dynamic command-line
changes add modular exports of that symbol, obviously wrong.
Signed-off-by: David S. Miller <davem@davemloft.net>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
[akpm@osdl.org: sparc64 fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I am slowly moving to a model where all process killing is struct pid based
instead of pid_t based. The sunos compatibility code is one of the last users
of the old pid_t based kill_pg in the kernel. By being complete I allow for
the future removal of kill_pg from the kernel, which will ensure I don't miss
something.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I noticed that almost all architectures implemented exactly the same
sys32_sysinfo... except parisc, where a bug was to be found in handling of
the uptime. So let's remove a whole whack of code for fun and profit.
Cribbed compat_sys_sysinfo from x86_64's implementation, since I figured it
would be the best tested.
This patch incorporates Arnd's suggestion of not using set_fs/get_fs, but
instead extracting out the common code from sys_sysinfo.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The line discipline numbers N_* are currently defined for each architecture
individually, but (except for a seeming mistake) identically, in
asm/termios.h. There is no obvious reason why these numbers should be
architecture specific, nor any apparent relationship with the termios
structure. The total number of these, NR_LDISCS, is defined in linux/tty.h
anyway. So I propose the following patch which moves the definitions of
the individual line disciplines to linux/tty.h too.
Three of these numbers (N_MASC, N_PROFIBUS_FDL, and N_SMSBLOCK) are unused
in the current kernel, but the patch still keeps the complete set in case
there are plans to use them yet.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update all arch/*/kernel/vmlinux.lds.S to not include space for initramfs
when CONFIG_BLK_DEV_INITRAMFS is not selected. This saves another 4 kbytes
on most platfoms (some reserve PAGE_SIZE for initramfs).
Signed-off-by: Jean-Paul Saman <jean-paul.saman@nxp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As Andi pointed out: CONFIG_GENERIC_ISA_DMA only disables the ISA DMA
channel management. Other functionality may still expect GFP_DMA to
provide memory below 16M. So we need to make sure that CONFIG_ZONE_DMA is
set independent of CONFIG_GENERIC_ISA_DMA. Undo the modifications to
mm/Kconfig where we made ZONE_DMA dependent on GENERIC_ISA_DMA and set
theses explicitly in each arches Kconfig.
Reviews must occur for each arch in order to determine if ZONE_DMA can be
switched off. It can only be switched off if we know that all devices
supported by a platform are capable of performing DMA transfers to all of
memory (Some arches already support this: uml, avr32, sh sh64, parisc and
IA64/Altix).
In order to switch ZONE_DMA off conditionally, one would have to establish
a scheme by which one can assure that no drivers are enabled that are only
capable of doing I/O to a part of memory, or one needs to provide an
alternate means of performing an allocation from a specific range of memory
(like provided by alloc_pages_range()) and insure that all drivers use that
call. In that case the arches alloc_dma_coherent() may need to be modified
to call alloc_pages_range() instead of relying on GFP_DMA.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nr_free_pages is now a simple access to a global variable. Make it a macro
instead of a function.
The nr_free_pages now requires vmstat.h to be included. There is one
occurrence in power management where we need to add the include. Directly
refrer to global_page_state() there to clarify why the #include was added.
[akpm@osdl.org: arm build fix]
[akpm@osdl.org: sparc64 build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is kind of hokey, we could use the hardware provided facilities
much better.
MSIs are assosciated with MSI Queues. MSI Queues generate interrupts
when any MSI assosciated with it is signalled. This suggests a
two-tiered IRQ dispatch scheme:
MSI Queue interrupt --> queue interrupt handler
MSI dispatch --> driver interrupt handler
But we just get one-level under Linux currently. What I'd like to do
is possibly stick the IRQ actions into a per-MSI-Queue data structure,
and dispatch them form there, but the generic IRQ layer doesn't
provide a way to do that right now.
So, the current kludge is to "ACK" the interrupt by processing the
MSI Queue data structures and ACK'ing them, then we run the actual
handler like normal.
We are wasting a lot of useful information, for example the MSI data
and address are provided with ever MSI, as well as a system tick if
available. If we could pass this into the IRQ handler it could help
with certain things, in particular for PCI-Express error messages.
The MSI entries on sparc64 also tell you exactly which bus/device/fn
sent the MSI, which would be great for error handling when no
registered IRQ handler can service the interrupt.
We override the disable/enable IRQ chip methods in sun4v_msi, so we
have to call {mask,unmask}_msi_irq() directly from there. This is
another ugly wart.
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise we can't use the generic MSI code.
Furthermore, properly use the {get,set}_irq_foo() abstracted
interfaces instead of direct accesses to irq_desc[]->foo.
Signed-off-by: David S. Miller <davem@davemloft.net>
Mirror the logic in the sun4u handler, we have to update
both registers even when we branch out to window fault
fixup handling.
The way it works is that if we are in etrap processing a
fault already, g4/g5 holds the original fault information.
If we take a window spill fault while doing etrap, then
we put the window spill fault info into g4/g5 and this is
what the top-level fault handler ends up processing first.
Then we retry the originally faulting instruction, and
process the original fault at that time.
This is all necessary because of how constrained the trap
registers are in these code paths. These cases trigger
very rarely, so even if there is some performance implication
it's doesn't happen very often. In fact the rarity is why
it took so long to trigger and find this particular bug.
Signed-off-by: David S. Miller <davem@davemloft.net>
Compiling the kernel with CONFIG_HOTPLUG = y and CONFIG_HOTPLUG_CPU = n
with CONFIG_RELOCATABLE = y generates the following modpost warnings
WARNING: vmlinux - Section mismatch: reference to .init.data: from
.text between '_cpu_up' (at offset 0xc0141b7d) and 'cpu_up'
WARNING: vmlinux - Section mismatch: reference to .init.data: from
.text between '_cpu_up' (at offset 0xc0141b9c) and 'cpu_up'
WARNING: vmlinux - Section mismatch: reference to .init.text:__cpu_up
from .text between '_cpu_up' (at offset 0xc0141bd8) and 'cpu_up'
WARNING: vmlinux - Section mismatch: reference to .init.data: from
.text between '_cpu_up' (at offset 0xc0141c05) and 'cpu_up'
WARNING: vmlinux - Section mismatch: reference to .init.data: from
.text between '_cpu_up' (at offset 0xc0141c26) and 'cpu_up'
WARNING: vmlinux - Section mismatch: reference to .init.data: from
.text between '_cpu_up' (at offset 0xc0141c37) and 'cpu_up'
This is because cpu_up, _cpu_up and __cpu_up (in some architectures) are
defined as __devinit
AND
__cpu_up calls some __cpuinit functions.
Since __cpuinit would map to __init with this kind of a configuration,
we get a .text refering .init.data warning.
This patch solves the problem by converting all of __cpu_up, _cpu_up
and cpu_up from __devinit to __cpuinit. The approach is justified since
the callers of cpu_up are either dependent on CONFIG_HOTPLUG_CPU or
are of __init type.
Thus when CONFIG_HOTPLUG_CPU=y, all these cpu up functions would land up
in .text section, and when CONFIG_HOTPLUG_CPU=n, all these functions would
land up in .init section.
Tested on a i386 SMP machine running linux-2.6.20-rc3-mm1.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
And this points out that the return value from
isa_dev_get_resource() and the 'pregs' arg to
isa_dev_get_irq() are totally unused.
Based upon a patch from Richard Mortimer <richm@oldelvet.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to pass in the resource otherwise we cannot
release the region properly. We must know whether it is
an I/O or MEM resource.
Spotted by Eric Brower.
Signed-off-by: David S. Miller <davem@davemloft.net>
We were not being careful enough. When we trim the physical
memory areas, we have to make sure we don't remove the kernel
image or initial ramdisk image ranges.
Signed-off-by: David S. Miller <davem@davemloft.net>
The apple fn keys don't work anymore with 2.6.20-rc1.
The reason is that USB_HID_POWERBOOK appears in several files although
USB_HIDINPUT_POWERBOOK is the thing to be used.
The patch fixes this.
Cc: Greg KH <greg@kroah.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It branches around some necessary prom calls, which we would
need to do even if we are mapped at the correct location already.
So it doesn't work.
The idea was that this sort of thing could be used for the eventual
kexec implementation, but it is clear that this will need to be
done differently.
Signed-off-by: David S. Miller <davem@davemloft.net>
Run this:
#!/bin/sh
for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do
echo "De-casting $f..."
perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f
done
And then go through and reinstate those cases where code is casting pointers
to non-pointers.
And then drop a few hunks which conflicted with outstanding work.
Cc: Russell King <rmk@arm.linux.org.uk>, Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Fulghum <paulkf@microgate.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- relbranch_fixup(), for non-branches, would end up setting
regs->tnpc incorrectly, in fact it would set it equal to
regs->tpc which would cause that instruction to execute twice
Also, if this is not a PC-relative branch, we should just
leave regs->tnpc as-is. This covers cases like 'jmpl' which
branch to absolute values.
- To be absolutely %100 safe, we need to flush the instruction
cache for all assignments to kprobe->ainsn.insn[], including
cases like add_aggr_kprobe()
- prev_kprobe's status field needs to be 'unsigned long' to match
the type of the value it is saving
- jprobes were totally broken:
= jprobe_return() can run in the stack frame of the jprobe handler,
or in an even deeper stack frame, thus we'll be in the wrong
register window than the one from the original probe state.
So unwind using 'restore' instructions, if necessary, right
before we do the jprobe_return() breakpoint trap.
= There is no reason to save/restore the register window saved
at %sp at jprobe trigger time. Those registers cannot be
modified by the jprobe handler. Also, this code was saving
and restoring "sizeof (struct sparc_stackf)" bytes. Depending
upon the caller, this could clobber unrelated stack frame
pieces if there is only a basic 128-byte register window
stored on the stack, without the argument save area.
So just saving and restoring struct pt_regs is sufficient.
= Kill the "jprobe_saved_esp", totally unused.
Also, delete "jprobe_saved_regs_location", with the stack frame
unwind now done explicitly by jprobe_return(), this check is
superfluous.
Signed-off-by: David S. Miller <davem@davemloft.net>
ptrace_traceme() consolidation made
ret = ptrace_traceme();
dead write.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Userspace is forbidden from making unaligned loads and
stores. So if we get an unaligned trap due to a
{get,put}_user(), signal a fault and run the exception
handler.
Signed-off-by: David S. Miller <davem@davemloft.net>
To add this logic, put the VIS instruction check at the
vis_emul() call site instead of inside of vis_emul().
Signed-off-by: David S. Miller <davem@davemloft.net>
This facility provides three entry points:
ilog2() Log base 2 of unsigned long
ilog2_u32() Log base 2 of u32
ilog2_u64() Log base 2 of u64
These facilities can either be used inside functions on dynamic data:
int do_something(long q)
{
...;
y = ilog2(x)
...;
}
Or can be used to statically initialise global variables with constant values:
unsigned n = ilog2(27);
When performing static initialisation, the compiler will report "error:
initializer element is not constant" if asked to take a log of zero or of
something not reducible to a constant. They treat negative numbers as
unsigned.
When not dealing with a constant, they fall back to using fls() which permits
them to use arch-specific log calculation instructions - such as BSR on
x86/x86_64 or SCAN on FRV - if available.
[akpm@osdl.org: MMC fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Wojtek Kaniewski <wojtekka@toxygen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the locking of signal->tty.
Use ->sighand->siglock to protect ->signal->tty; this lock is already used
by most other members of ->signal/->sighand. And unless we are 'current'
or the tasklist_lock is held we need ->siglock to access ->signal anyway.
(NOTE: sys_unshare() is broken wrt ->sighand locking rules)
Note that tty_mutex is held over tty destruction, so while holding
tty_mutex any tty pointer remains valid. Otherwise the lifetime of ttys
are governed by their open file handles. This leaves some holes for tty
access from signal->tty (or any other non file related tty access).
It solves the tty SLAB scribbles we were seeing.
(NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
be examined by someone familiar with the security framework, I think
it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
invocations)
[schwidefsky@de.ibm.com: 3270 fix]
[akpm@osdl.org: various post-viro fixes]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Alan Cox <alan@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define elf_addr_t in linux/elf.h. The size of the type is determined using
ELF_CLASS. This allows us to remove the defines that today are spread all
over .c and .h files.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Daniel Jacobowitz <drow@false.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Following up with the work on shared page table done by Dave McCracken. This
set of patch target shared page table for hugetlb memory only.
The shared page table is particular useful in the situation of large number of
independent processes sharing large shared memory segments. In the normal
page case, the amount of memory saved from process' page table is quite
significant. For hugetlb, the saving on page table memory is not the primary
objective (as hugetlb itself already cuts down page table overhead
significantly), instead, the purpose of using shared page table on hugetlb is
to allow faster TLB refill and smaller cache pollution upon TLB miss.
With PT sharing, pte entries are shared among hundreds of processes, the cache
consumption used by all the page table is smaller and in return, application
gets much higher cache hit ratio. One other effect is that cache hit ratio
with hardware page walker hitting on pte in cache will be higher and this
helps to reduce tlb miss latency. These two effects contribute to higher
application performance.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Cc: Dave McCracken <dmccr@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Print the addresses of non-absolute symbols relative to _text
so that ld will generate relocations. Allowing a relocatable
kernel to relocate them. We can't actually use the symbol names
because kallsyms includes static symbols that are not exported
from their object files.
Add the _text symbol definitions to the architectures which don't
define it otherwise linker will fail.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The setting of the CACHE_LINE_SIZE register in sparc64's pci
initialisation code isn't quite adequate as the device may have
incompatible requirements. The generic code tests for this, so switch
sparc64 over to using it.
Since sparc64 has different L1 cache line size and PCI cache line size,
it would need to override the generic code like i386 and ia64 do. We
know what the cache line size is at compile time though, so introduce a
new optional constant PCI_CACHE_LINE_BYTES.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: David Miller <davem@davemloft.net>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When I added the entries for the robust futex syscall entries, I
forgot to bump NR_SYSCALLS. The current situation is error-prone
because NR_SYSCALLS lives in entry.S where the system call limit
checks are enforced. Move the definition to asm/unistd.h in order to
make this mistake much more difficult to make.
And wire up sys_migrate_pages since the powerpc folks implemented the
compat wrapper for us.
Signed-off-by: David S. Miller <davem@davemloft.net>
The code in schizo_irq_trans_init() should set irq_data->sync_reg
to the location of the SYNC register if this is Tomatillo, and set
it to zero otherwise. But that is not what it is doing.
As a result, non-Tomatillo systems were trying to access a
non-existent register resulting in bus errors at the first
PCI interrupt.
Thanks to Roland Stigge for the bug report.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a vmlinux.lds.h helper macro for defining the eight-level initcall table,
teach all the architectures to use it.
This is a prerequisite for a patch which performs initcall synchronisation for
multithreaded-probing.
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Added AVR32 as well ]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
dp->path_component_name can be larger than ->bus_id[]
so use a different naming scheme for this stuff.
Noticed by Jurij Smakov.
Signed-off-by: David S. Miller <davem@davemloft.net>
The second argument to free_npages() was being incorrectly
calculated, which would thus access far past the end of the
arena->map[] bitmap.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) probe_other_fhcs() wants to see only non-central FHC
busses, so skip FHCs that don't sit off the root
2) Like SBUS, FHC can lack the appropriate address and
size cell count properties, so add an of_busses[]
entry and handlers for that.
3) Central FHC irq translator probing was buggy. We
were trying to use dp->child in irq_trans_init but
that linkage is not setup at this point.
So instead, pass in the parent of "dp" and look for
the child "fhc" with parent "central".
Thanks to the tireless assistence of Ben Collins in tracking
down these problems and testing out these fixes.
Signed-off-by: David S. Miller <davem@davemloft.net>
For Hummingbird PCI controllers, we should create the root
PCI memory space resource as the full 4GB area, and then
allocate the IOMMU DMA translation window out of there.
The old code just assumed that the IOMMU DMA translation base
to the top of the 4GB area was unusable. This is not true on
many systems such as SB100 and SB150, where the IOMMU DMA
translation window sits at 0xc0000000->0xdfffffff.
So what would happen is that any device mapped by the firmware
at the top section 0xe0000000->0xffffffff would get remapped
by Linux somewhere else leading to all kinds of problems and
boot failures.
While we're here, report more cases of OBP resource assignment
conflicts. The only truly valid ones are ROM resource conflicts.
Signed-off-by: David S. Miller <davem@davemloft.net>
Unused, but still allow the '-s' boot option to be passed
down to init.
Based upon patches by Martin Habets.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix this 2.6.19-rc1 build warnings from modpost:
WARNING: vmlinux - Section mismatch: reference to .init.text:sunzilog_console_setup from .data between 'sunzilog_console' (at offset 0x8394) and 'devices_subsys'
Signed-off-by: Martin Habets <errandir_news@mph.eclipse.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
They have to be unique system-wide, so use
"NAME@NODE" as the string pattern of the non-root
nodes.
Thanks to Andrew Morton for fixing the error value
checking in bus_add_device() which made this problem
finally noticable.
Signed-off-by: David S. Miller <davem@davemloft.net>