This splits off several pieces of code that parse the
ibm,dynamic-reconfiguration-memory node of the device tree into separate
helper routines. This is in preparation for the next commit that will
use these helper routines. There are no functional changes in this patch.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently the kernel fails to build with the above config options with:
CC arch/powerpc/mm/mem.o
arch/powerpc/mm/mem.c: In function 'arch_add_memory':
arch/powerpc/mm/mem.c:130: error: implicit declaration of function 'create_section_mapping'
This explicitly includes asm/sparsemem.h in arch/powerpc/mm/mem.c and
moves the guards in include/asm-powerpc/sparsemem.h to protect the
SPARSEMEM specific portions only.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, if we have a kernel with a 64kB page size, and some
process maps something that has to be mapped with 4kB pages (such as a
cache-inhibited mapping on POWER5+, or the eHCA infiniband queue-pair
pages), we change the process to use 4kB pages everywhere. This hurts
the performance of HPC programs that access eHCA from userspace.
With this patch, the kernel will only demote the slice(s) containing
the eHCA or cache-inhibited mappings, leaving the remaining slices
able to use 64kB hardware pages.
This also changes the slice_get_unmapped_area code so that it is
willing to place a 64k-page mapping into (or across) a 4k-page slice
if there is no better alternative, i.e. if the program specified
MAP_FIXED or if there is not sufficient space available in slices that
are either empty or already have 64k-page mappings in them.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While working on the 36-bit physical support, I noticed that there
was exactly one line of code that actually referenced the bitfields.
So I got rid of them and redefined ppc_bat as a struct of 2 u32's:
batu and batl. I also got rid of the previous union that held the
bitfield structs and a word representation of the batu/l values.
This seems like a nicer solution than adding in a bunch of
new bitfields to support extended bat addressing that would never
get used, and just leaving the struct as-is would have been
incomplete in the face of large physical addressing.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, the physical address is an unsigned long, but it should
be phys_addr_t in set_bat, [v/p]_mapped_by_bat. Also, create a
macro that can convert a large physical address into the correct
format for programming the BAT registers.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This frees a PTE bit when using 64K pages on ppc64. This is done
by getting rid of the separate _PAGE_HASHPTE bit. Instead, we just test
if any of the 16 sub-page bits is set. For non-combo pages (ie. real
64K pages), we set SUB0 and the location encoding in that field.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
When we demote a slice from 64k to 4k, and we are about to insert an
HPTE for a 4k subpage and we notice that there is an existing 64k
HPTE, we first invalidate that HPTE before inserting the new 4k
subpage HPTE. Since the bits that encode which hash bucket the old
HPTE was in overlap with the bits that encode which of the 16 subpages
have HPTEs, we need to clear out the subpage HPTE-present bits before
starting to insert HPTEs for the 4k subpages. If we don't do that, we
can erroneously think that a subpage already has an HPTE when it
doesn't.
That in itself wouldn't be such a problem except that when we go to
update the HPTE that we think is present on machines with a
hypervisor, the hypervisor can tell us that the HPTE we think is there
is actually there even though it isn't, which can lead to a process
getting stuck in a loop, continually faulting. The reason for the
confusion is that the AVPN (abbreviated virtual page number) we are
looking for in the HPTE for a 4k subpage can actually match the AVPN
in a stale HPTE for another 64k page. For example, the HPTE for
the 4k subpage at 0x84000f000 will be in the same hash bucket and have
the same AVPN as the HPTE for the 64k page at 0x8400f0000.
This fixes the code to clear out the subpage HPTE-present bits.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The ehea driver was recently changed[1] to use walk_memory_resource() to
detect the system's memory layout. However, walk_memory_resource() is
available only when memory hotplug is enabled. So CONFIG_EHEA was
made to depend on MEMORY_HOTPLUG [2], but it is inappropriate for a
network driver to have such a dependency.
Make the declaration of walk_memory_resource() and its powerpc
implementation (ehea is powerpc-specific) unconditionally available.
[1] 48cfb14f8b
"ehea: Add DLPAR memory remove support"
[2] fb7b6ca2b6
"ehea: Add dependency to Kconfig"
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
__set_fixmap() in pgtable_32.c currently fails to compile if
STRICT_MM_TYPECHECKS is defined. This fixes it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This removes a CVS keyword that wasn't updated for a long time from a
comment.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This changes vmemmap to use a different region (region 0xf) of the
address space, and to configure the page size of that region
dynamically at boot.
The problem with the current approach of always using 16M pages is that
it's not well suited to machines that have small amounts of memory such
as small partitions on pseries, or PS3's.
In fact, on the PS3, failure to allocate the 16M page backing vmmemmap
tends to prevent hotplugging the HV's "additional" memory, thus limiting
the available memory even more, from my experience down to something
like 80M total, which makes it really not very useable.
The logic used by my match to choose the vmemmap page size is:
- If 16M pages are available and there's 1G or more RAM at boot,
use that size.
- Else if 64K pages are available, use that
- Else use 4K pages
I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages)
and it seems to work fine.
Note that I intend to change the way we organize the kernel regions &
SLBs so the actual region will change from 0xf back to something else at
one point, as I simplify the SLB miss handler, but that will be for a
later patch.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Somewhere along the way (e28f7faf05,
"Four level pagetables for ppc64") we ended up with duplicate
definitions for pte_freelist_cur and pte_freelist_force_free.
Somehow this compiles, but it would be better to just have one
definition for each.
The two definitions we end up with can be static too!
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
... instead of having extern declarations in a .c file.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make two vmemmap helpers static in init_64.c
Make stab variables static in stab.c
Make psize defs static in hash_utils_64.c
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
... instead of having an extern declaration in a .c file.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a regression reported by Kamalesh Bulabel where a POWER4
machine would crash because of an SLB miss at a point where the SLB
miss exception was unrecoverable. This regression is tracked at:
http://bugzilla.kernel.org/show_bug.cgi?id=10082
SLB misses at such points shouldn't happen because the kernel stack is
the only memory accessed other than things in the first segment of the
linear mapping (which is mapped at all times by entry 0 of the SLB).
The context switch code ensures that SLB entry 2 covers the kernel
stack, if it is not already covered by entry 0. None of entries 0
to 2 are ever replaced by the SLB miss handler.
Where this went wrong is that the context switch code assumes it
doesn't have to write to SLB entry 2 if the new kernel stack is in the
same segment as the old kernel stack, since entry 2 should already be
correct. However, when we start up a secondary cpu, it calls
slb_initialize, which doesn't set up entry 2. This is correct for
the boot cpu, where we will be using a stack in the kernel BSS at this
point (i.e. init_thread_union), but not necessarily for secondary
cpus, whose initial stack can be allocated anywhere. This doesn't
cause any immediate problem since the SLB miss handler will just
create an SLB entry somewhere else to cover the initial stack.
In fact it's possible for the cpu to go quite a long time without SLB
entry 2 being valid. Eventually, though, the entry created by the SLB
miss handler will get overwritten by some other entry, and if the next
access to the stack is at an unrecoverable point, we get the crash.
This fixes the problem by making slb_initialize create a suitable
entry for the kernel stack, if we are on a secondary cpu and the stack
isn't covered by SLB entry 0. This requires initializing the
get_paca()->kstack field earlier, so I do that in smp_create_idle
where the current field is initialized. This also abstracts a bit of
the computation that mk_esid_data in slb.c does so that it can be used
in slb_initialize.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Arrange for a syntax check to always be done on the powerpc/mm/slb.c
DBG() macro by defining it to pr_debug() for non-debug builds.
Also, fix these related compile warnings:
slb.c:273: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int
slb.c:274: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int'
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Provide walk_memory_resource() for 64-bit powerpc. PowerPC maintains
logical memory region mapping in the lmb.memory structure. Walk
through these structures and do the callbacks for the contiguous
chunks.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
All architectures use an effectively identical definition of online_page(), so
just make it common code. x86-64, ia64, powerpc and sh are actually
identical; x86-32 is slightly different.
x86-32's differences arise because it puts its hotplug pages in the highmem
zone. We can handle this in the generic code by inspecting the page to see if
its in highmem, and update the totalhigh_pages count appropriately. This
leaves init_32.c:free_new_highpage with a single caller, so I folded it into
add_one_highpage_init.
I also removed an incorrect comment referring to the NUMA case; any NUMA
details have already been dealt with by the time online_page() is called.
[akpm@linux-foundation.org: fix indenting]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use (31-THREAD_SHIFT) to get to thread_info from stack pointer. This makes
the code a bit easier to read and more robust if we ever change THREAD_SHIFT.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The fixmap code from x86 allows us to have compile time virtual addresses
that we change the physical addresses of at run time.
This is useful for applications like kmap_atomic, PCI config that is done
via direct memory map, kexec/kdump.
We got ride of CONFIG_HIGHMEM_START as we can now determine a more optimal
location for PKMAP_BASE based on where the fixmap addresses start and
working back from there.
Additionally, the kmap code in asm-powerpc/highmem.h always had debug
enabled. Moved to using CONFIG_DEBUG_HIGHMEM to determine if we should
have the extra debug checking.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Added support to allow an 85xx kernel to be run from a non-zero physical
address (useful for cooperative asymmetric multiprocessing situations and
kdump). The support can be configured at compile time by setting
CONFIG_PAGE_OFFSET, CONFIG_KERNEL_START, and CONFIG_PHYSICAL_START as
desired.
Alternatively, the kernel build can set CONFIG_RELOCATABLE. Setting this
config option causes the kernel to determine at runtime the physical
addresses of CONFIG_PAGE_OFFSET and CONFIG_KERNEL_START. If
CONFIG_RELOCATABLE is set, then CONFIG_PHYSICAL_START has no meaning.
However, CONFIG_PHYSICAL_START will always be used to set the LOAD program
header physical address field in the resulting ELF image.
Currently we are limited to running at a physical address that is a
multiple of 256M. This is due to how we map TLBs to cover
lowmem. This should be fixed to allow 64M or maybe even 16M alignment
in the future. It is considered an error to try and run a kernel at a
non-aligned physical address.
All the magic for this support is accomplished by proper initialization
of the kernel memory subsystem and use of ARCH_PFN_OFFSET.
The use of ARCH_PFN_OFFSET only affects normal memory and not IO mappings.
ioremap uses map_page and isn't affected by ARCH_PFN_OFFSET.
/dev/mem continues to allow access to any physical address in the system
regardless of how CONFIG_PHYSICAL_START is set.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
numa.c requires routines declared in linux/of.h, so should include it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove the __max_memory variable, as it is not referenced anywhere
in the tree besides some code in arch/ppc.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When we moved to arch/powerpc we actively tried to avoid using the
ppc_md.setup_io_mappings(). Currently no board ports use it so let's
remove it to avoid any new boards using it.
Also, remove early_serial_map() since we don't even have a call out for
it in arch/powerpc.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We always use __initial_memory_limit as an address so rename it
to be clear.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* Determine the RPN we are running the kernel at runtime rather
than using compile time constant for initial TLB
* Cleanup adjust_total_lowmem() to respect memstart_addr and
be a bit more clear on variables that are sizes vs addresses.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
total_lowmem represents the amount of low memory, not the physical
address that low memory ends at. If the start of memory is at 0 it
happens that total_lowmem can be used as both the size and the address
that lowmem ends at (or more specifically one byte beyond the end).
To make the code a bit more clear and deal with the case when the start of
memory isn't at physical 0, we introduce lowmem_end_addr that represents
one byte beyond the last physical address in the lowmem region.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A number of users of PPC_MEMSTART (40x, ppc_mmu_32) can just always
use 0 as we don't support booting these kernels at non-zero physical
addresses since their exception vectors must be at 0 (or 0xfffx_xxxx).
For the sub-arches that support relocatable interrupt vectors
(book-e), it's reasonable to have memory start at a non-zero physical
address. For those cases use the variable memstart_addr instead of
the #define PPC_MEMSTART since the only uses of PPC_MEMSTART are for
initialization and in the future we can set memstart_addr at runtime
to have a relocatable kernel.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This eliminates a warning in builds that don't define
CONFIG_MEMORY_HOTPLUG.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
hash_page_sync() takes and releases the low level mmu hash
lock in order to sync with other processors disposing of page
tables. Because that lock can be needed to service hash misses
triggered by interrupt handlers, taking it must be done with
interrupts off. However, hash_page_sync() appears to be called
with interrupts enabled, thus causing occasional deadlocks.
We fix it by making sure hash_page_sync() masks interrupts while
holding the lock.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
show_mem() has no need to print the amount of free swap space manually
because show_free_areas() does this already and is called by the
former.
The two outputs only differ in text formatting:
printk("Free swap = %lukB\n", ...);
printk("Free swap: %6ldkB\n", ...);
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
WARNING: vmlinux.o(.text+0xb41b0): Section mismatch in reference from the
function .add_memory() to the function .devinit.text:.arch_add_memory()
The function .add_memory() references
the function __devinit .arch_add_memory().
This is often because .add_memory lacks a __devinit
annotation or the annotation of .arch_add_memory is wrong.
arch_add_memory() is also not __devinit on other architectures
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If the platform doesn't support hpte_removebolted(), gracefully
return failure rather than success.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On pSeries, the hypervisor doesn't let us map in the eHEA ethernet
adapter using 64k pages, and thus the ehea driver will fail if 64k
pages are configured. This works around the problem by always
using 4k pages for ioremap on pSeries (but not on other platforms).
A better fix would be to check whether the partition could ever
have an eHEA adapter, and only force 4k pages if it could, but this
will do for 2.6.25.
This is based on an earlier patch by Tony Breeds.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since the PMU is an NMI now, it can come at any time we are only soft
disabled. We must hard disable around the two places we allow the kernel
stack SLB and r1 to go out of sync. Otherwise the PMU exception can
force a kernel stack SLB into another slot, which can lead to it
getting evicted, which can lead to a nasty unrecoverable SLB miss
in the exception entry code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
My recent hack to allocate the hash table under 1GB on cell was poorly
tested, *cough*. It turns out on blades with large amounts of memory we
fail to allocate the hash table at all. This is because RTAS has been
instantiated just below 768MB, and 0-x MB are used by the kernel,
leaving no areas that are both large enough and also naturally-aligned.
For the cell IOMMU hack the page tables must be under 2GB, so use that
as the limit instead. This has been tested on real hardware and boots
happily.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For memory remove, we need to clean up htab mappings for the
section of the memory we are removing.
This implements support for removing htab bolted mappings for pSeries
logical partitions. Other sub-archs may need to implement similar
functionality for hotplug memory remove to work on them.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Add arch-specific walk_memory_remove() for 64-bit powerpc
[POWERPC] Enable hotplug memory remove for 64-bit powerpc
[POWERPC] Add remove_memory() for 64-bit powerpc
[POWERPC] Make cell IOMMU fixed mapping printk more useful
[POWERPC] Fix potential cell IOMMU bug when switching back to default DMA ops
[POWERPC] Don't enable cell IOMMU fixed mapping if there are no dma-ranges
[POWERPC] Fix cell IOMMU null pointer explosion on old firmwares
[POWERPC] spufs: Fix timing dependent false return from spufs_run_spu
[POWERPC] spufs: No need to have a runnable SPU for libassist update
[POWERPC] spufs: Update SPU_Status[CISHP] in backing runcntl write
[POWERPC] spufs: Fix state_mutex leaks
[POWERPC] Disable G5 NAP mode during SMU commands on U3
Background: I've implemented 1K/2K page tables for s390. These sub-page
page tables are required to properly support the s390 virtualization
instruction with KVM. The SIE instruction requires that the page tables
have 256 page table entries (pte) followed by 256 page status table entries
(pgste). The pgstes are only required if the process is using the SIE
instruction. The pgstes are updated by the hardware and by the hypervisor
for a number of reasons, one of them is dirty and reference bit tracking.
To avoid wasting memory the standard pte table allocation should return
1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
Problem: Page size on s390 is 4K, page table size is 1K or 2K. That means
the s390 version for pte_alloc_one cannot return a pointer to a struct
page. Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
cannot return a pointer to a pte either, since that would require more than
32 bit for the return value of pte_alloc_one (and the pte * would not be
accessible since its not kmapped).
Solution: The only solution I found to this dilemma is a new typedef: a
pgtable_t. For s390 pgtable_t will be a (pte *) - to be introduced with a
later patch. For everybody else it will be a (struct page *). The
additional problem with the initialization of the ptl lock and the
NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
a destructor pgtable_page_dtor. The page table allocation and free
functions need to call these two whenever a page table page is allocated or
freed. pmd_populate will get a pgtable_t instead of a struct page pointer.
To get the pgtable_t back from a pmd entry that has been installed with
pmd_populate a new function pmd_pgtable is added. It replaces the pmd_page
call in free_pte_range and apply_to_pte_range.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
walk_memory_resource() verifies if there are holes in a given memory
range, by checking against /proc/iomem. On x86/ia64 system memory is
represented in /proc/iomem. On powerpc, we don't show system memory as
IO resource in /proc/iomem - instead it's maintained in
/proc/device-tree.
This provides a way for an architecture to provide its own
walk_memory_resource() function. On powerpc, the memory region is
small (16MB), contiguous and non-overlapping. So extra checking
against the device-tree is not needed.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Supply remove_memory() function for 64-bit powerpc. This is still
not quite complete as it needs to do some more arch-specific stuff,
which will be added in a later patch.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.
This patch:
Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past. This is to avoid conflicts.
Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here's a dumb simple implementation of fake NUMA nodes for PowerPC.
Fake NUMA nodes can be specified using the following command line
option
numa=fake=<node range>
node range is of the format <range1>,<range2>,...<rangeN>
Each of the rangeX parameters is passed using memparse(). I find the
patch useful for fake NUMA emulation on my simple PowerPC machine.
I've tested it on a numa box with the following arguments
numa=fake=512M
numa=fake=512M,768M
numa=fake=256M,512M mem=512M
numa=fake=1G mem=768M
numa=fake=
without any numa= argument
The other side-effect introduced by this patch is that; in the case
where we don't have NUMA information, we now set a node online after
adding each LMB. This node could very well be node 0, but in the case
that we enable fake NUMA nodes, when we cross node boundaries, we need
to set the new node online.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, update_mmu_cache will crash if given a no-access PTE.
There's no need to synchronize dcache/icache unless it's an exec
mapping -- however, due to the existence of older glibc versions that
execute out of a read-but-no-exec page, readability is tested instead.
This assumes no exec-only mappings; if such mappings become supported,
they will need to go through the kmap_atomic() version of
dcache/icache synchronization.
This fixes a bug reported by some users where the kernel would crash
while dumping core on a threaded program.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(with Martin Schwidefsky <schwidefsky@de.ibm.com>)
The pgd/pud/pmd/pte page table allocation functions get a mm_struct pointer as
first argument. The free functions do not get the mm_struct argument. This
is 1) asymmetrical and 2) to do mm related page table allocations the mm
argument is needed on the free function as well.
[kamalesh@linux.vnet.ibm.com: i386 fix]
[akpm@linux-foundation.org: coding-syle fixes]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to support the fixed IOMMU mapping (in a subsequent patch),
we need the hash table to be inside the IOMMUs DMA window. This is
usually 2G, but let's make sure the hash table is under 1G as that
will satisfy the IOMMU requirements and also means the hash table will
be on node 0.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This reverts commit 5c3f5892a2,
basically because it changes behaviour even when no fake NUMA
information is specified on the kernel command line.
Firstly, it changes the nid, thus destroying the real NUMA
information. Secondly, it also changes behaviour in that if a node
ends up with no memory in it because of the memory limit, we used to
set it online and now we don't.
Also, in the non-NUMA case with no fake NUMA information, we do
node_set_online once for each LMB now, whereas previously we only did
it once. I don't know if that is actually a problem, but it does seem
unnecessary.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes the setjmp/longjmp code used by xmon, generically available
to other code. It also removes the requirement for debugger hooks to
be only called on 0x300 (data storage) exception.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The e500 MMU init code previously assumed KERNELBASE always equaled
PAGE_OFFSET and PHYSICAL_START was 0. This is useful for kdump
support as well as asymetric multicore.
For the initial kdump support the secondary kernel will run at 32M
but need access to all of memory so we bump the initial TLB up to
64M. This also matches with the forth coming ePAPR spec.
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
There were several issues if a memreserve range existed and happened
to be in highmem:
* The bootmem allocator is only aware of lowmem so calling
reserve_bootmem with a highmem address would cause a BUG_ON
* All highmem pages were provided to the buddy allocator
Added a lmb_is_reserved() api that we now use to determine if a highem
page should continue to be PageReserved or provided to the buddy
allocator.
Also, we incorrectly reported the amount of pages reserved since all
highmem pages are initally marked reserved and we clear the
PageReserved flag as we "free" up the highmem pages.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Using 64k pages on 64-bit PowerPC systems makes life difficult for
emulators that are trying to emulate an ISA, such as x86, which use a
smaller page size, since the emulator can no longer use the MMU and
the normal system calls for controlling page protections. Of course,
the emulator can emulate the MMU by checking and possibly remapping
the address for each memory access in software, but that is pretty
slow.
This provides a facility for such programs to control the access
permissions on individual 4k sub-pages of 64k pages. The idea is
that the emulator supplies an array of protection masks to apply to a
specified range of virtual addresses. These masks are applied at the
level where hardware PTEs are inserted into the hardware page table
based on the Linux PTEs, so the Linux PTEs are not affected. Note
that this new mechanism does not allow any access that would otherwise
be prohibited; it can only prohibit accesses that would otherwise be
allowed. This new facility is only available on 64-bit PowerPC and
only when the kernel is configured for 64k pages.
The masks are supplied using a new subpage_prot system call, which
takes a starting virtual address and length, and a pointer to an array
of protection masks in memory. The array has a 32-bit word per 64k
page to be protected; each 32-bit word consists of 16 2-bit fields,
for which 0 allows any access (that is otherwise allowed), 1 prevents
write accesses, and 2 or 3 prevent any access.
Implicit in this is that the regions of the address space that are
protected are switched to use 4k hardware pages rather than 64k
hardware pages (on machines with hardware 64k page support). In fact
the whole process is switched to use 4k hardware pages when the
subpage_prot system call is used, but this could be improved in future
to switch only the affected segments.
The subpage protection bits are stored in a 3 level tree akin to the
page table tree. The top level of this tree is stored in a structure
that is appended to the top level of the page table tree, i.e., the
pgd array. Since it will often only be 32-bit addresses (below 4GB)
that are protected, the pointers to the first four bottom level pages
are also stored in this structure (each bottom level page contains the
protection bits for 1GB of address space), so the protection bits for
addresses below 4GB can be accessed with one fewer loads than those
for higher addresses.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds the hugepagesz boot-time parameter for ppc64. It lets one
pick the size for huge pages. The choices available are 64K and 16M
when the base page size is 4k. It defaults to 16M (previously the
only only choice) if nothing or an invalid choice is specified.
Tested 64K huge pages successfully with the libhugetlbfs 1.2.
Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 473980a993 added a call to clear
the SLB shadow buffer before registering it. Unfortunately this means
that we clear out the entries that slb_initialize has previously set in
there. On POWER6, the hypervisor uses the SLB shadow buffer when doing
partition switches, and that means that after the next partition switch,
each non-boot CPU has no SLB entries to map the kernel text and data,
which causes it to crash.
This fixes it by reverting most of 473980a9 and instead clearing the
3rd entry explicitly in slb_initialize. This fixes the problem that
473980a9 was trying to solve, but without breaking POWER6.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Before we register the SLB shadow buffer, we need to invalidate the
entries in the buffer, otherwise we can end up stale entries from when
we previously offlined the CPU.
This does this invalidate as well as unregistering the buffer with
PHYP before we offline the cpu. Tested and fixes crashes seen on
970MP (thanks to tonyb) and POWER5.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Here's a dumb simple implementation of fake NUMA nodes for PowerPC.
Fake NUMA nodes can be specified using the following command line option
numa=fake=<node range>
node range is of the format <range1>,<range2>,...<rangeN>
Each of the rangeX parameters is passed using memparse(). I find this
useful for fake NUMA emulation on my simple PowerPC machine. I've
tested it on a non-numa box with the following arguments:
numa=fake=1G
numa=fake=1G,2G
name=fake=1G,512M,2G
numa=fake=1500M,2800M mem=3500M
numa=fake=1G mem=512M
numa=fake=1G mem=1G
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently we hardwire the number of SLBs to 64, but PAPR says we
should use the ibm,slb-size property to obtain the number of SLB
entries. This uses this property instead of assuming 64. If no
property is found, we assume 64 entries as before.
This soft patches the SLB handler, so it shouldn't change performance
at all.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
My changes to _tlbie to fix 4xx unfortunately broke 8xx build in a
couple of places. This fixes it.
Spotted by Olof Johansson.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Include <asm/iseries/hv_call.h> in arch/powerpc/mm/stab.c to fix the
following compile error (found with randconfig):
CC arch/powerpc/mm/stab.o
arch/powerpc/mm/stab.c: In function "stab_initialize":
arch/powerpc/mm/stab.c:282: error: implicit declaration of function "HvCall1"
arch/powerpc/mm/stab.c:282: error: "HvCallBaseSetASR" undeclared (first use in this function)
arch/powerpc/mm/stab.c:282: error: (Each undeclared identifier is reported only once
arch/powerpc/mm/stab.c:282: error: for each function it appears in.)
make[1]: *** [arch/powerpc/mm/stab.o] Error 1
make: *** [arch/powerpc/mm] Error 2
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
vmemmap_populate will printk (with KERN_WARNING) for a lot of pages
if CONFIG_SPARSEMEM_VMEMMAP is enabled (at least it does on iSeries).
Use pr_debug for it instead.
Replace the only other use of DBG in this file with pr_debug as well.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The patch "KVM: fix !SMP build error" change the way smp_call_function()
actually uses the passed in function names on non-SMP builds. So
previously it was never caught that the function passed in was never
actually defined.
This causes a build error on ppc64_defconfig + CONFIG_SMP=n:
arch/powerpc/mm/tlb_64.c: In function 'pgtable_free_now':
arch/powerpc/mm/tlb_64.c:71: error: 'pte_free_smp_sync' undeclared (first use in this function)
arch/powerpc/mm/tlb_64.c:71: error: (Each undeclared identifier is reported only once
arch/powerpc/mm/tlb_64.c:71: error: for each function it appears in.)
So we need to define it even if CONFIG_SMP is off. Either that or ifdef
out the smp_call_function() call, but that's ugly.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that we have 1TB segment size support, we need to be using the
GET_ESID_1T macro when comparing ESID values for pc, stack, and
unmapped_base within switch_slb(). A new helper function called
esids_match() contains the logic for deciding when to call GET_ESID
and GET_ESID_1T.
This fixes a duplicate-slb-entry inspired machine-check exception I
was seeing when trying to run java on a power6 partition.
Tested on power6 and power5.
Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes the error
error: implicit declaration of function "udbg_printf"
We have a few spots where we reference udbg_printf() without #including
udbg.h. These are within #ifdef DEBUG blocks, so unnoticed until we do
a #define DEBUG or #define DEBUG_LOW nearby.
Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
mmu_mapin_ram() loops over total_lowmem to setup page tables. However, if
total_lowmem is less that 16M, the subtraction rolls over and results in
a number just under 4G (because total_lowmem is an unsigned value).
This patch rejigs the loop from countup to countdown to eliminate the
bug.
Special thanks to Magnus Hjorth who wrote the original patch to fix this
bug. This patch improves on his by making the loop code simpler (which
also eliminates the possibility of another rollover at the high end)
and also applies the change to arch/powerpc.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The 44x family has an interesting "feature" which is a virtually
tagged instruction cache (yuck !). So far, we haven't dealt with
it properly, which means we've been mostly lucky or people didn't
report the problems, unless people have been running custom patches
in their distro...
This is an attempt at fixing it properly. I chose to do it by
setting a global flag whenever we change a PTE that was previously
marked executable, and flush the entire instruction cache upon
return to user space when that happens.
This is a bit heavy handed, but it's hard to do more fine grained
flushes as the icbi instruction, on those processor, for some very
strange reasons (since the cache is virtually mapped) still requires
a valid TLB entry for reading in the target address space, which
isn't something I want to deal with.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
On 4xx CPUs, the current implementation of flush_tlb_page() uses
a low level _tlbie() assembly function that only works for the
current PID. Thus, invalidations caused by, for example, a COW
fault triggered by get_user_pages() from a different context will
not work properly, causing among other things, gdb breakpoints
to fail.
This patch adds a "pid" argument to _tlbie() on 4xx processors,
and uses it to flush entries in the right context. FSL BookE
also gets the argument but it seems they don't need it (their
tlbivax form ignores the PID when invalidating according to the
document I have).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
When demoting a process to use 4K HW pages (instead of 64K), which
happens under various circumstances such as doing cache inhibited
mappings on machines that do not support 64K CI pages, the assembly
hash code calls back into the C function flush_hash_page(). This
function prototype was recently changed to accomodate for 1T segments
but the assembly call site was not updated, causing applications that
do demotion to hang. In addition, when updating the per-CPU PACA for
the new sizes, we didn't properly update the slice "map", thus causing
the SLB miss code to re-insert segments for the wrong size.
This fixes both and adds a warning comment next to the C
implementation to try to avoid problems next time someone changes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().
A cgroup init has it's tsk->pid == 1.
A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.
Changelog:
2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().
2.6.21-mm2-pidns2:
- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.
[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (24 commits)
[POWERPC] Fix vmemmap warning in init_64.c
[POWERPC] Fix 64 bits vDSO DWARF info for CR register
[POWERPC] Add 1TB workaround for PA6T
[POWERPC] Enable NO_HZ and high res timers for pseries and ppc64 configs
[POWERPC] Quieten cache information at boot
[POWERPC] Quieten clockevent printk
[POWERPC] Enable SLUB in *_defconfig
[POWERPC] Fix 1TB segment detection
[POWERPC] Fix iSeries_hpte_insert prototype
[POWERPC] Fix copyright symbol
[POWERPC] ibmebus: Move to of_device and of_platform_driver, match eHCA and eHEA drivers
[POWERPC] ibmebus: Add device creation and bus probing based on of_device
[POWERPC] ibmebus: Remove bus match/probe/remove functions
[POWERPC] Move of_device allocation into of_device.[ch]
[POWERPC] mpc52xx: device tree changes for FEC and MDIO
[POWERPC] bestcomm: GenBD task support
[POWERPC] bestcomm: FEC task support
[POWERPC] bestcomm: ATA task support
[POWERPC] bestcomm: core bestcomm support for Freescale MPC5200
[POWERPC] mpc52xx: Update mpc52xx_psc structure with B revision changes
...
Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.
Convert
ctor(void *object, struct kmem_cache *s, unsigned long flags)
to
ctor(struct kmem_cache *s, void *object)
throughout the kernel
[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the right printk format to silence the following warning.
CC arch/powerpc/mm/init_64.o
arch/powerpc/mm/init_64.c: In function 'vmemmap_populate':
arch/powerpc/mm/init_64.c:243: warning: format '%p' expects type 'void *', but argument 4 has type 'long unsigned int'
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
PA6T has a bug where the slbie instruction does not honor the large
segment bit. As a result, we have to always use slbia when switching
context.
We don't have to worry about changing the slbie's during fault processing,
since they should never be replacing one VSID with another using the
same ESID. I.e. there's no risk for inserting duplicate entries due to a
failed slbie of the old entry. So as long as we clear it out on context
switch we should be fine.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Buglet in the 1TB detection makes it return after checking the first
property word, even if it's not a match.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
htab_bolt_mapping takes another argument now the 1TB code has been
merged. Update vmemmap_populate to match.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
This patch cleans up them. This is against 2.6.23-rc6-mm1.
- fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case.
- For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
which returns -EINVAL.
- removed remove_pages() only used in powerpc.
- removed no-op remove_memory() in i386, sh, sparc64, x86_64.
- only powerpc returns -ENOSYS at memory hot remove(no-op). changes it
to return -EINVAL.
Note:
Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
archs if there are requirements and testers.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enable virtual memmap support for SPARSEMEM on PPC64 systems. Slice a 16th
off the end of the linear mapping space and use that to hold the vmemmap.
Uses the same size mapping as uses in the linear 1:1 kernel mapping.
[pbadari@gmail.com: fix warning]
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the kernel use 1TB segments for all kernel mappings and for
user addresses of 1TB and above, on machines which support them
(currently POWER5+, POWER6 and PA6T).
We detect that the machine supports 1TB segments by looking at the
ibm,processor-segment-sizes property in the device tree.
We don't currently use 1TB segments for user addresses < 1T, since
that would effectively prevent 32-bit processes from using huge pages
unless we also had a way to revert to using 256MB segments. That
would be possible but would involve extra complications (such as
keeping track of which segment size was used when HPTEs were inserted)
and is not addressed here.
Parts of this patch were originally written by Ben Herrenschmidt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The CONFIG_FSL_BOOKE mmu setup code fails when CONFIG_HIGHMEM=y
and the 3 fixed TLB entries cannot exactly map the lowmem size.
Each TLB entry can map 4MB, 16MB, 64MB or 256MB, so the failure
is observed when the kernel lowmem size is not equal to the
sum of up to 3 of those values.
Normally, memory is sized in nice numbers, but I observed this
problem while testing a crash dump kernel. The failure can
also be observed by artificially reducing the kernel's main
memory via the mem= kernel command line parameter.
This commit fixes the problem by setting __initial_memory_limit
in adjust_total_lowmem().
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The 8xx can only support a max of 8M during early boot (it seems a lot of
8xx boards only have 8M so the bug was never triggered), but the early
allocator isn't aware of this. The following change makes it able to run
with larger memory.
Signed-off-by: John Traill <john.traill@freescale.com>
Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The entries are only 32-bit, so restrict the virtual address to stay
below 0xffff_ffff. With KERNELBASE set to 0xc000_0000, this in effect
restricts access to the first 1GB of real memory.
Make setup_kcore conditional on CONFIG_PROC_KCORE for both 32/64.
Signed-off-by: Ed Swarthout <Ed.Swarthout@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Linus made this suggestion for the x86 merge and this starts the process
for powerpc. We assume that CONFIG_PPC64 implies CONFIG_PPC_MERGE and
CONFIG_PPC_STD_MMU_32 implies CONFIG_PPC_STD_MMU.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
After talking to an IBM POWER hypervisor (PHYP) design and development
guy, there seems to be no need for memory barriers when updating the SLB
shadow buffer provided we only update it from the current CPU, which we
do.
Also, these guys see no need in the future for these barriers.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Export new __io{re,un}map_at() symbols so modules can use them.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>