Commit Graph

580 Commits

Author SHA1 Message Date
Becky Bruce
49a8496525 powerpc: Allow mem=x cmdline to work with 4G+
We're currently choking on mem=4g (and above) due to memory_limit
being specified as an unsigned long. Make memory_limit
phys_addr_t to fix this.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-15 16:43:41 +10:00
Ingo Molnar
e7fd5d4b3d Merge branch 'linus' into perfcounters/core
Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up
              the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29 14:47:05 +02:00
Stephen Rothwell
b62c31ae40 powerpc: fix for long standing bug noticed by gcc 4.4.0
Previous gcc versions didn't notice this because one of the preceding
#ifs always evaluated to true.

gcc 4.4.0 produced this error:

arch/powerpc/mm/tlb_nohash_low.S:206:6: error: #elif with no expression

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-23 08:52:16 -05:00
Kumar Gala
323d23aeac Revert "powerpc: Add support for early tlbilx opcode"
This reverts commit e996557740.  Our HW
guys were able to fix this so it never sees the light of day.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-23 08:51:22 -05:00
Michael Ellerman
24f1ce803c powerpc: Fix crash on CPU hotplug
early_init_mmu_secondary() is called at CPU hotplug time, so it
must be marked as __cpuinit, not __init.

Caused by 757c74d2 ("powerpc/mm: Introduce early_init_mmu() on 64-bit").

Tested-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-22 14:56:34 +10:00
Peter Zijlstra
78f13e9525 perf_counter: allow for data addresses to be recorded
Paul suggested we allow for data addresses to be recorded along with
the traditional IPs as power can provide these.

For now, only the software pagefault events provide data addresses,
but in the future power might as well for some events.

x86 doesn't seem capable of providing this atm.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.394816925@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08 19:05:56 +02:00
Kumar Gala
52ce67f157 powerpc/mm: Fix compile warning
arch/powerpc/mm/tlb_nohash.c: In function 'flush_tlb_mm':
arch/powerpc/mm/tlb_nohash.c:128: warning: unused variable 'cpu_mask'

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-07 22:11:10 -05:00
Kumar Gala
e996557740 powerpc: Add support for early tlbilx opcode
During the ISA 2.06 development the opcode for tlbilx changed and some
early implementations used to old opcode.  Add support for a MMU_FTR
fixup to deal with this.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-07 01:36:30 -05:00
Peter Zijlstra
ac17dc8e58 perf_counter: provide major/minor page fault software events
Provide separate sw counters for major and minor page faults.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:29:40 +02:00
Peter Zijlstra
7dd1fcc258 perf_counter: provide pagefault software events
We use the generic software counter infrastructure to provide
page fault events.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:29:37 +02:00
Benjamin Herrenschmidt
757c74d298 powerpc/mm: Introduce early_init_mmu() on 64-bit
This moves some MMU related init code out of setup_64.c into hash_utils_64.c
and calls it early_init_mmu() and early_init_mmu_secondary(). This will
make it easier to plug in a new MMU type.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:34 +11:00
Benjamin Herrenschmidt
ff7c660092 powerpc/mm: Fix printk type warning in mmu_context_nohash
We need to use %zu instead of %d when printing a sizeof()

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:34 +11:00
Benjamin Herrenschmidt
d62cbf45a8 powerpc/mm: Rename arch/powerpc/kernel/mmap.c to mmap_64.c
This file is only useful on 64-bit, so we name it accordingly.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:33 +11:00
Benjamin Herrenschmidt
8d1cf34e7a powerpc/mm: Tweak PTE bit combination definitions
This patch tweaks the way some PTE bit combinations are defined, in such a
way that the 32 and 64-bit variant become almost identical and that will
make it easier to bring in a new common pte-* file for the new variant
of the Book3-E support.

The combination of bits defining access to kernel pages are now clearly
separated from the combination used by userspace and the core VM. The
resulting generated code should remain identical unless I made a mistake.

Note: While at it, I removed a non-sensical statement related to CONFIG_KGDB
in ppc_mmu_32.c which could cause kernel mappings to be user accessible when
that option is enabled. Probably something that bitrot.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:33 +11:00
Rusty Russell
56aa4129e8 cpumask: Use mm_cpumask() wrapper instead of cpu_vm_mask
Makes code futureproof against the impending change to mm->cpu_vm_mask.

It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:29 +11:00
Benjamin Herrenschmidt
9e5efaa936 powerpc/mm: Properly wire up get_user_pages_fast() on 32-bit
While we did add support for _PAGE_SPECIAL on some 32-bit platforms,
we never actually built get_user_pages_fast() on them. This fixes
it which requires a little bit of ifdef'ing around.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:34 +11:00
Benjamin Herrenschmidt
1cdab55d8a powerpc: Wire up /proc/vmallocinfo to our ioremap()
This adds the necessary bits and pieces to powerpc implementation of
ioremap to benefit from caller tracking in /proc/vmallocinfo, at least
for ioremap's done after mem init as the older ones aren't tracked.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:10:14 +11:00
Kumar Gala
c3071951d0 powerpc/fsl-booke: Add support for tlbilx instructions
The e500mc core supports the new tlbilx instructions that do core
local invalidates and also provide us the ability to take down
all TLB entries matching a given PID.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-03-09 09:25:38 -05:00
Anton Blanchard
002b0ec73d powerpc: Increase stack gap on 64bit binaries
On 64bit there is a possibility our stack and mmap randomisation will put
the two close enough such that we can't expand our stack to match the ulimit
specified.

To avoid this, start the upper mmap address at 1GB + 128MB below the top of our
address space, so in the worst case we end up with the same ~128MB hole as in
32bit. This works because we randomise the stack over a 1GB range.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:21 +11:00
Anton Blanchard
a5adc91a4b powerpc: Ensure random space between stack and mmaps
get_random_int() returns the same value within a 1 jiffy interval. This means
that the mmap and stack regions will almost always end up the same distance
apart, making a relative offset based attack possible.

To fix this, shift the randomness we use for the mmap region by 1 bit.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:21 +11:00
Anton Blanchard
9f14c42d75 powerpc: Randomise mmap start address
Randomise mmap start address - 8MB on 32bit and 1GB on 64bit tasks.
Until ppc32 uses the mmap.c functionality, this is ppc64 specific.

Before:

# ./test & cat /proc/${!}/maps|tail -2|head -1
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0

After:
# ./test & cat /proc/${!}/maps|tail -2|head -1
f718b000-f7b8c000 rw-p f718b000 00:00 0
f7551000-f7f52000 rw-p f7551000 00:00 0
f6ee7000-f78e8000 rw-p f6ee7000 00:00 0
f74d4000-f7ed5000 rw-p f74d4000 00:00 0
f6e9d000-f789e000 rw-p f6e9d000 00:00 0

Similar for 64bit, but with 1GB of scatter:
# ./test & cat /proc/${!}/maps|tail -2|head -1
fffb97b5000-fffb97b6000 rw-p fffb97b5000 00:00 0
fffce9a3000-fffce9a4000 rw-p fffce9a3000 00:00 0
fffeaaf2000-fffeaaf3000 rw-p fffeaaf2000 00:00 0
fffd88ac000-fffd88ad000 rw-p fffd88ac000 00:00 0
fffbc62e000-fffbc62f000 rw-p fffbc62e000 00:00 0

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:07 +11:00
Anton Blanchard
13a2cb3694 powerpc: Rearrange mmap.c
Rearrange mmap.c to better match the x86 version.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:06 +11:00
Nathan Fontenot
0f16ef7fd3 powerpc/numa: Cleanup hot_add_scn_to_nid
This patch reworks the hot_add_scn_to_nid and its supporting functions
to make them easier to understand.  There are no functional changes in
this patch and has been tested on machine with memory represented in the
device tree as memory nodes and in the ibm,dynamic-memory property.

My previous patch that introduced support for hotplug memory add on
systems whose memory was represented by the ibm,dynamic-memory property
of the device tree only left the code more unintelligible.  This
will hopefully makes things easier to understand.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:04 +11:00
Anton Blanchard
13870b6575 powerpc/mm: Reduce hashtable size when using 64kB pages
At the moment we size the hashtable based on 4kB pages / 2, even on a
64kB kernel. This results in a hashtable that is much larger than it
needs to be.

Grab the real page size and size the hashtable based on that

Note: This only has effect on non hypervisor machines.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:58 +11:00
Benjamin Herrenschmidt
3b7faeb49e Merge commit 'kumar/next' into next 2009-02-18 13:23:30 +11:00
Benjamin Herrenschmidt
82a0a1cc8f Merge commit 'origin/master' into next
Manual merge of:
	arch/powerpc/include/asm/pgtable-ppc32.h
2009-02-18 13:19:25 +11:00
Dave Hansen
06eccea6c3 powerpc/mm: Fix numa reserve bootmem page selection
Fix the powerpc NUMA reserve bootmem page selection logic.

commit 8f64e1f2d1 (powerpc: Reserve
in bootmem lmb reserved regions that cross NUMA nodes) changed
the logic for how the powerpc LMB reserved regions were converted
to bootmen reserved regions.  As the folowing discussion reports,
the new logic was not correct.

mark_reserved_regions_for_nid() goes through each LMB on the
system that specifies a reserved area.  It searches for
active regions that intersect with that LMB and are on the
specified node.  It attempts to bootmem-reserve only the area
where the active region and the reserved LMB intersect.  We
can not reserve things on other nodes as they may not have
bootmem structures allocated, yet.

We base the size of the bootmem reservation on two possible
things.  Normally, we just make the reservation start and
stop exactly at the start and end of the LMB.

However, the LMB reservations are not aware of NUMA nodes and
on occasion a single LMB may cross into several adjacent
active regions.  Those may even be on different NUMA nodes
and will require separate calls to the bootmem reserve
functions.  So, the bootmem reservation must be trimmed to
fit inside the current active region.

That's all fine and dandy, but we trim the reservation
in a page-aligned fashion.  That's bad because we start the
reservation at a non-page-aligned address: physbase.

The reservation may only span 2 bytes, but that those bytes
may span two pfns and cause a reserve_size of 2*PAGE_SIZE.

Take the case where you reserve 0x2 bytes at 0x0fff and
where the active region ends at 0x1000.  You'll jump into
that if() statment, but node_ar.end_pfn=0x1 and
start_pfn=0x0.  You'll end up with a reserve_size=0x1000,
and then call

  reserve_bootmem_node(node, physbase=0xfff, size=0x1000);

0x1000 may not be on the same node as 0xfff.  Oops.

In almost all the vm code, end_<anything> is not inclusive.
If you have an end_pfn of 0x1234, page 0x1234 is not
included in the range.  Using PFN_UP instead of the
(>> >> PAGE_SHIFT) will make this consistent with the other VM
code.

We also need to do math for the reserved size with physbase
instead of start_pfn.  node_ar.end_pfn << PAGE_SHIFT is
*precisely* the end of the node.  However,
(start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
of the reserved area.  That is, of course, physbase.
If we don't use physbase here, the reserve_size can be
made too large.

From: Dave Hansen <dave@linux.vnet.ibm.com>
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>  Tested on PS3.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-13 16:37:45 +11:00
Kumar Gala
96a8bac589 powerpc/fsl-booke: Fix compile warning
arch/powerpc/mm/fsl_booke_mmu.c: In function 'adjust_total_lowmem':
arch/powerpc/mm/fsl_booke_mmu.c:221: warning: format '%ld' expects type 'long int', but argument 3 has type 'phys_addr_t'

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-12 16:54:53 -06:00
Kumar Gala
d66c82ea45 powerpc/fsl-booke: Add new ISA 2.06 page sizes and MAS defines
The Power ISA 2.06 added power of two page sizes to the embedded MMU
architecture.  Its done it such a way to be code compatiable with the
existing HW.  Made the minor code changes to support both power of two
and power of four page sizes.  Also added some new MAS bits and macros
that are defined as part of the 2.06 ISA.  Renamed some things to use
the 'Book-3e' concept to convey the new MMU that is based on the
Freescale Book-E MMU programming model.

Note, its still invalid to try and use a page size that isn't supported
by cpu.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-12 16:37:11 -06:00
Kumar Gala
f99fb8a2cb powerpc/mm: Fix _PAGE_COHERENT support on classic ppc32 HW
The following commit:

commit 64b3d0e812
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date:   Thu Dec 18 19:13:51 2008 +0000

    powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED

broke setting of the _PAGE_COHERENT bit in the PPC HW PTE.  Since we now
actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it
out before we propogate it to the PPC HW PTE.

Reported-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 16:07:02 +11:00
Benjamin Herrenschmidt
8d30c14cab powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.

The "old" way was split in 3 different parts depending on the processor type:

   - Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.

   - Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().

   - Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.

That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.

Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.

This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c

The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).

If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.

For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.

For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.

This is also a first step for adding _PAGE_EXEC support for embedded platforms

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 16:00:10 +11:00
Anton Blanchard
91b0f5ec53 powerpc/mm: Move 64-bit unmapped_area to top of address space
We currently place mmaps just below the stack on 32bit, but leave them
in the middle of the address space on 64bit:

00100000-00120000 r-xp 00100000 00:00 0                    [vdso]
10000000-10010000 r-xp 00000000 08:06 179534               /tmp/sleep
10010000-10020000 rw-p 00000000 08:06 179534               /tmp/sleep
10020000-10130000 rw-p 10020000 00:00 0                    [heap]
40000000000-40000030000 r-xp 00000000 08:06 440743         /lib64/ld-2.9.so
40000030000-40000040000 rw-p 00020000 08:06 440743         /lib64/ld-2.9.so
40000050000-400001f0000 r-xp 00000000 08:06 440671         /lib64/libc-2.9.so
400001f0000-40000200000 r--p 00190000 08:06 440671         /lib64/libc-2.9.so
40000200000-40000220000 rw-p 001a0000 08:06 440671         /lib64/libc-2.9.so
40000220000-40008230000 rw-p 40000220000 00:00 0
fffffbc0000-fffffd10000 rw-p fffffeb0000 00:00 0           [stack]

Right now it isn't an issue, but at some stage we will run into mmap or
hugetlb allocation issues. Using the same layout as 32bit gives us a
some breathing room. This matches what x86-64 is doing too.

00100000-00103000 r-xp 00100000 00:00 0                    [vdso]
10000000-10001000 r-xp 00000000 08:06 554894               /tmp/test
10010000-10011000 r--p 00000000 08:06 554894               /tmp/test
10011000-10012000 rw-p 00001000 08:06 554894               /tmp/test
10012000-10113000 rw-p 10012000 00:00 0                    [heap]
fffefdf7000-ffff7df8000 rw-p fffefdf7000 00:00 0
ffff7df8000-ffff7f97000 r-xp 00000000 08:06 130591         /lib64/libc-2.9.so
ffff7f97000-ffff7fa6000 ---p 0019f000 08:06 130591         /lib64/libc-2.9.so
ffff7fa6000-ffff7faa000 r--p 0019e000 08:06 130591         /lib64/libc-2.9.so
ffff7faa000-ffff7fc0000 rw-p 001a2000 08:06 130591         /lib64/libc-2.9.so
ffff7fc0000-ffff7fc4000 rw-p ffff7fc0000 00:00 0
ffff7fc4000-ffff7fec000 r-xp 00000000 08:06 130663         /lib64/ld-2.9.so
ffff7fee000-ffff7ff0000 rw-p ffff7fee000 00:00 0
ffff7ffa000-ffff7ffb000 rw-p ffff7ffa000 00:00 0
ffff7ffb000-ffff7ffc000 r--p 00027000 08:06 130663         /lib64/ld-2.9.so
ffff7ffc000-ffff7fff000 rw-p 00028000 08:06 130663         /lib64/ld-2.9.so
ffff7fff000-ffff8000000 rw-p ffff7fff000 00:00 0
fffffc59000-fffffc6e000 rw-p ffffffeb000 00:00 0           [stack]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 16:00:07 +11:00
Milton Miller
8b16cd238d powerpc/numa: Remove redundant find_cpu_node()
Use of_get_cpu_node, which is a superset of numa.c's find_cpu_node in
a less restrictive section (text vs cpuinit).

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 13:37:59 +11:00
Milton Miller
20fcefe5a0 powerpc/numa: Avoid possible reference beyond prop. length in find_min_common_depth()
find_min_common_depth() was checking the property length incorrectly.
The value is in bytes not cells, and it is using the second entry.

Signed-off-By: Milton Miller <miltonm@bga.com>

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 13:37:58 +11:00
Benjamin Herrenschmidt
edbc29d76d Merge commit 'kumar/next' into next 2009-02-11 13:37:44 +11:00
Kumar Gala
6c24b17453 powerpc/fsl-booke: Fix mapping functions to use phys_addr_t
Fixed v_mapped_by_tlbcam() and p_mapped_by_tlbcam() to use phys_addr_t
instead of unsigned long.  In 36-bit physical mode we really need these
functions to deal with phys_addr_t when trying to match a physical
address or when returning one.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-09 21:11:55 -06:00
Trent Piepho
96051465fd powerpc/fsl-booke: Make CAM entries used for lowmem configurable
On booke processors, the code that maps low memory only uses up to three
CAM entries, even though there are sixteen and nothing else uses them.

Make this number configurable in the advanced options menu along with max
low memory size.  If one wants 1 GB of lowmem, then it's typically
necessary to have four CAM entries.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-28 18:16:54 -06:00
Trent Piepho
c8f3570b7e powerpc/fsl-booke: Allow larger CAM sizes than 256 MB
The code that maps kernel low memory would only use page sizes up to 256
MB.  On E500v2 pages up to 4 GB are supported.

However, a page must be aligned to a multiple of the page's size.  I.e.
256 MB pages must aligned to a 256 MB boundary.  This was enforced by a
requirement that the physical and virtual addresses of the start of lowmem
be aligned to 256 MB.  Clearly requiring 1GB or 4GB alignment to allow
pages of that size isn't acceptable.

To solve this, I simply have adjust_total_lowmem() take alignment into
account when it decides what size pages to use.  Give it PAGE_OFFSET =
0x7000_0000, PHYSICAL_START = 0x3000_0000, and 2GB of RAM, and it will map
pages like this:
PA 0x3000_0000 VA 0x7000_0000 Size 256 MB
PA 0x4000_0000 VA 0x8000_0000 Size 1 GB
PA 0x8000_0000 VA 0xC000_0000 Size 256 MB
PA 0x9000_0000 VA 0xD000_0000 Size 256 MB
PA 0xA000_0000 VA 0xE000_0000 Size 256 MB

Because the lowmem mapping code now takes alignment into account,
PHYSICAL_ALIGN can be lowered from 256 MB to 64 MB.  Even lower might be
possible.  The lowmem code will work down to 4 kB but it's possible some of
the boot code will fail before then.  Poor alignment will force small pages
to be used, which combined with the limited number of TLB1 pages available,
will result in very little memory getting mapped.  So alignments less than
64 MB probably aren't very useful anyway.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-28 18:16:53 -06:00
Trent Piepho
f88747e7f6 powerpc/fsl-booke: Remove code duplication in lowmem mapping
The code to map lowmem uses three CAM aka TLB[1] entries to cover it.  The
size of each is stored in three globals named __cam0, __cam1, and __cam2.
All the code that uses them is duplicated three times for each of the three
variables.

We have these things called arrays and loops....

Once converted to use an array, it will be easier to make the number of
CAMs configurable.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-28 18:16:51 -06:00
Gerhard Pircher
4c456a67f5 powerpc/mm: Fix handling of _PAGE_COHERENT in BAT setup code
_PAGE_COHERENT is now always set in _PAGE_RAM resp. PAGE_KERNEL.
Thus it has to be masked out, if the BAT mapping should be non
cacheable or CPU_FTR_NEED_COHERENT is not set.

This will work on normal SMP setups because we force-set
CPU_FTR_NEED_COHERENT as part of CPU_FTR_COMMON on SMP.

Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-28 17:15:52 +11:00
Dave Kleikamp
9ba0fdbfae powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices
powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices

The subpage_prot syscall fails on second and subsequent calls for a given
region, because is_hugepage_only_range() is mis-identifying the 4 kB
slices when the process has a 64 kB page size.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-16 16:15:16 +11:00
Ingo Molnar
fe333321e2 powerpc: Change u64/s64 to a long long integer type
Convert arch/powerpc/ over to long long based u64:

 -#ifdef __powerpc64__
 -# include <asm-generic/int-l64.h>
 -#else
 -# include <asm-generic/int-ll64.h>
 -#endif
 +#include <asm-generic/int-ll64.h>

This will avoid reoccuring spurious warnings in core kernel code that
comes when people test on their own hardware. (i.e. x86 in ~98% of the
cases) This is what x86 uses and it generally helps keep 64-bit code
32-bit clean too.

[Adjusted to not impact user mode (from paulus) - sfr]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-13 14:47:59 +11:00
Benjamin Herrenschmidt
30aae739a9 Merge commit 'kumar/kumar-next' into next 2009-01-13 13:59:03 +11:00
Anton Vorontsov
7021d86afa powerpc/mm: Make clear_fixmap() actually work
The clear_fixmap() routine issues map_page() with flags set to 0.
Currently this causes a BUG_ON() inside the map_page(), as it assumes
that a PTE should be clear before mapping.

This patch makes the map_page() to trigger the BUG_ON() only if the
flags were set.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:17 +11:00
Benjamin Herrenschmidt
4a0826824b powerpc: Fix missing semicolons in mmu_decl.h
This is a brown paper bag from one of my earlier patches that
breaks build on 40x and 8xx.

And yes, I've now added 40x and 8xx to my list of test configs :-)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:17 +11:00
Dave Liu
d6a09e0cd6 powerpc: Remove the redundant _tlbil_pid at SMP case
Signed-off-by: Dave Liu <daveliu@freescale.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:13 +11:00
Dave Hansen
893473df78 powerpc/mm: Cleanup careful_allocation(): consolidate memset()
Both users of careful_allocation() immediately memset() the
result.  So, just do it in one place.

Also give careful_allocation() a 'z' prefix to bring it in
line with kzmalloc() and friends.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:09 +11:00
Dave Hansen
0be210fd66 powerpc/mm: Make careful_allocation() return virtual addrs
Since we memset() the result in both of the uses here,
just make careful_alloc() return a virtual address.
Also, add a separate variable to store the physial
address that comes back from the lmb_alloc() functions.
This makes it less likely that someone will screw it up
forgetting to convert before returning since the vaddr
is always in a void* and the paddr is always in an
unsigned long.

I admit this is arbitrary since one of its users needs
a paddr and one a vaddr, but it does remove a good
number of casts.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:08 +11:00
Dave Hansen
5d21ea2b0e powerpc/mm:: Cleanup careful_allocation(): bootmem already panics
If we fail a bootmem allocation, the bootmem code itself
panics.  No need to redo it here.

Also change the wording of the other panic.  We don't
strictly have to allocate memory on the specified node.
It is just a hint and that node may not even *have* any
memory on it.  In that case we can and do fall back to
other nodes.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:08 +11:00
Dave Hansen
c555e520ef powerpc/mm: Add better comment on careful_allocation()
The behavior in careful_allocation() really confused me
at first.  Add a comment to hopefully make it easier
on the next doofus that looks at it.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:08 +11:00
Trent Piepho
6fd8be4bf7 powerpc/fsl-booke: Remove num_tlbcam_entries
This is a global variable defined in fsl_booke_mmu.c with a value that gets
initialized in assembly code in head_fsl_booke.S.

It's never used.

If some code ever does want to know the number of entries in TLB1, then
"numcams = mfspr(SPRN_TLB1CFG) & 0xfff", is a whole lot simpler than a
global initialized during kernel boot from assembly.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-07 15:33:07 -06:00
Trent Piepho
19f5465e82 powerpc/fsl-booke: Don't hard-code size of struct tlbcam
Some assembly code in head_fsl_booke.S hard-coded the size of struct tlbcam
to 20 when it indexed the TLBCAM table.  Anyone changing the size of struct
tlbcam would not know to expect that.

The kernel already has a system to get the size of C structures into
assembly language files, asm-offsets, so let's use it.

The definition of the struct gets moved to a header, so that asm-offsets.c
can include it.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-07 15:33:06 -06:00
Gary Hade
c04fc586c1 mm: show node to memory section relationship with symlinks in sysfs
Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX.  For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
  - Provides information needed to determine the specific node
    on which a defective DIMM is located.  This will reduce system
    downtime when the node or defective DIMM is swapped out.
  - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM.  This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node.  The consequences of reintroducing the defective memory
    could be ugly.
  - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
Future:
  - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems.  Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:00 -08:00
Mel Gorman
3340289ddf mm: report the MMU pagesize in /proc/pid/smaps
The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
kernel to back a VMA.  This matches the size used by the MMU in the
majority of cases.  However, one counter-example occurs on PPC64 kernels
whereby a kernel using 64K as a base pagesize may still use 4K pages for
the MMU on older processor.  To distinguish, this patch reports
MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:58:58 -08:00
Linus Torvalds
3c92ec8ae9 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits)
  powerpc/44x: Support 16K/64K base page sizes on 44x
  powerpc: Force memory size to be a multiple of PAGE_SIZE
  powerpc/32: Wire up the trampoline code for kdump
  powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
  powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
  powerpc/32: Setup OF properties for kdump
  powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs()
  powerpc: Prepare xmon_save_regs for use with kdump
  powerpc: Remove default kexec/crash_kernel ops assignments
  powerpc: Make default kexec/crash_kernel ops implicit
  powerpc: Setup OF properties for ppc32 kexec
  powerpc/pseries: Fix cpu hotplug
  powerpc: Fix KVM build on ppc440
  powerpc/cell: add QPACE as a separate Cell platform
  powerpc/cell: fix build breakage with CONFIG_SPUFS disabled
  powerpc/mpc5200: fix error paths in PSC UART probe function
  powerpc/mpc5200: add rts/cts handling in PSC UART driver
  powerpc/mpc5200: Make PSC UART driver update serial errors counters
  powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver
  powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver
  ...

Fix trivial conflict in drivers/char/Makefile as per Paul's directions
2008-12-28 16:54:33 -08:00
Ilya Yanok
ca9153a3a2 powerpc/44x: Support 16K/64K base page sizes on 44x
This adds support for 16k and 64k page sizes on PowerPC 44x processors.

The PGDIR table is much smaller than a page when using 16k or 64k
pages (512 and 32 bytes respectively) so we allocate the PGDIR with
kzalloc() instead of __get_free_pages().

One PTE table covers rather a large memory area when using 16k or 64k
pages (32MB or 512MB respectively), so we can easily put FIXMAP and
PKMAP in the area covered by one PTE table.

Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Vladimir Panfilov <pvr@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-29 09:53:25 +11:00
James Morris
cbacc2c7f0 Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
Dale Farnsworth
ccdcef72c2 powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
Add the ability for a classic ppc kernel to be loaded at an address
of 32MB.  This done by fixing a few places that assume we are loaded
at address 0, and by changing several uses of KERNELBASE to use
PAGE_OFFSET, instead.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:29 +11:00
Anton Vorontsov
01695a9687 powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
While for debugging it is good to catch bogus users of ioremap, though
for kdump support it is more convenient to use __ioremap for
copy_oldmem_page() (exactly as we do for PPC64 currently).

Note that copy_oldmem_page() calls __ioremap with flags set to '0',
so it should be safe with the regard to the caches.

The other option is to use kmap_atomic_pfn()[1], but it will not work
for kernels compiled without HIGHMEM.

That is, on a board with 256MB RAM and crashkernel=64M@32M case, the
!HIGHMEM capturing kernel maps 0-96M range, which does not include all
the memory needed to capture the dump. And, obviously, accessing
anything upper than 96M will cause faults.

[1] http://ozlabs.org/pipermail/linuxppc-dev/2007-November/046747.html

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:29 +11:00
Benjamin Herrenschmidt
a14953597b powerpc: Fix missing 'blr' in _tlbia()
Rework to MMU code dropped a much missed 'blr' instruction.

Brown-Paper-Bag-Worn-By: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-12-21 02:54:25 -07:00
Benjamin Herrenschmidt
64b3d0e812 powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in
in the hash code based on some CPU feature bit.  We also manipulate
_PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places.

This changes the logic so that instead, the PTE now contains
_PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms
that need it.  The hash code clears it if the feature bit is not set.

It also adds some clean accessors to setup various valid combinations
of access flags and change various bits of code to use them instead.

This should help having the PTE actually containing the bit
combinations that we really want.

I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead
set it explicitely from the TLB miss.  I will ultimately remove it
completely as it appears that it might not be needed after all
but in the meantime, having it in the TLB miss makes things a
lot easier.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt
7752035180 powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs
This makes the MMU context code used for CPUs with no hash table
(except 603) dynamically allocate the various maps used to track
the state of contexts.

Only the main free map and CPU 0 stale map are allocated at boot
time.  Other CPU maps are allocated when those CPUs are brought up
and freed if they are unplugged.

This also moves the initialization of the MMU context management
slightly later during the boot process, which should be fine as
it's really only needed when userland if first started anyways.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt
760ec0e02d powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440
The handlers for Critical, Machine Check or Debug interrupts
will save and restore MMUCR nowadays, thus we only need to
disable normal interrupts when invalidating TLB entries.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt
2a4aca1144 powerpc/mm: Split low level tlb invalidate for nohash processors
Currently, the various forms of low level TLB invalidations are all
implemented in misc_32.S for 32-bit processors, in a fairly scary
mess of #ifdef's and with interesting duplication such as a whole
bunch of code for FSL _tlbie and _tlbia which are no longer used.

This moves things around such that _tlbie is now defined in
hash_low_32.S and is only used by the 32-bit hash code, and all
nohash CPUs use the various _tlbil_* forms that are now moved to
a new file, tlb_nohash_low.S.

I moved all the definitions for that stuff out of
include/asm/tlbflush.h as they are really internal mm stuff, into
mm/mmu_decl.h

The code should have no functional changes.  I kept some variants
inline for trivial forms on things like 40x and 8xx.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt
f048aace29 powerpc/mm: Add SMP support to no-hash TLB handling
This commit moves the whole no-hash TLB handling out of line into a
new tlb_nohash.c file, and implements some basic SMP support using
IPIs and/or broadcast tlbivax instructions.

Note that I'm using local invalidations for D->I cache coherency.

At worst, if another processor is trying to execute the same and
has the old entry in its TLB, it will just take a fault and re-do
the TLB flush locally (it won't re-do the cache flush in any case).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt
7c03d653cd powerpc/mm: Introduce MMU features
We're soon running out of CPU features and I need to add some new
ones for various MMU related bits, so this patch separates the MMU
features from the CPU features.  I moved over the 32-bit MMU related
ones, added base features for MMU type families, but didn't move
over any 64-bit only feature yet.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt
2ca8cf7389 powerpc/mm: Rework context management for CPUs with no hash table
This reworks the context management code used by 4xx,8xx and
freescale BookE.  It adds support for SMP by implementing a
concept of stale context map to lazily flush the TLB on
processors where a context may have been invalidated.  This
also contains the ground work for generalizing such lazy TLB
flushing by just picking up a new PID and marking the old one
stale.  This will be implemented later.

This is a first implementation that uses a global spinlock.

Ideally, we should try to get at least the fast path (context ID
already assigned) lockless or limited to a per context lock,
but for now this will do.

I tried to keep the UP case reasonably simple to avoid adding
too much overhead to 8xx which does a lot of context stealing
since it effectively has only 16 PIDs available.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt
5e696617c4 powerpc/mm: Split mmu_context handling
This splits the mmu_context handling between 32-bit hash based
processors, 64-bit hash based processors and everybody else.  This is
preliminary work for adding SMP support for BookE processors.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt
f63837f058 powerpc/mm: Remove flush_HPTE()
The function flush_HPTE() is used in only one place, the implementation
of DEBUG_PAGEALLOC on ppc32.

It's actually a dup of flush_tlb_page() though it's -slightly- more
efficient on hash based processors.  We remove it and replace it by
a direct call to the hash flush code on those processors and to
flush_tlb_page() for everybody else.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-16 15:53:34 +11:00
Benjamin Herrenschmidt
e41e811a79 powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c
This renames the files to clarify the fact that they are used by
the hash based family of CPUs (the 603 being an exception in that
family but is still handled by that code).

This paves the way for the new tlb_nohash.c coming via a subsequent
commit.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-16 15:53:30 +11:00
Paul Mackerras
1e1c568d6c Merge branch 'merge' into next 2008-12-16 14:38:58 +11:00
Dave Hansen
a4c74ddd5e powerpc: Fix bootmem reservation on uninitialized node
careful_allocation() was calling into the bootmem allocator for
nodes which had not been fully initialized and caused a previous
bug:  http://patchwork.ozlabs.org/patch/10528/  So, I merged a
few broken out loops in do_init_bootmem() to fix it.  That changed
the code ordering.

I think this bug is triggered by having reserved areas for a node
which are spanned by another node's contents.  In the
mark_reserved_regions_for_nid() code, we attempt to reserve the
area for a node before we have allocated the NODE_DATA() for that
nid.  We do this since I reordered that loop.  I suck.

This is causing crashes at bootup on some systems, as reported
by Jon Tollefson.

This may only present on some systems that have 16GB pages
reserved.  But, it can probably happen on any system that is
trying to reserve large swaths of memory that happen to span other
nodes' contents.

This commit ensures that we do not touch bootmem for any node which
has not been initialized, and also removes a compile warning about
an unused variable.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-16 13:48:18 +11:00
Brian King
48f797de55 powerpc: Check for valid hugepage size in hugetlb_get_unmapped_area
It looks like most of the hugetlb code is doing the correct thing if
hugepages are not supported, but the mmap code is not.  If we get into
the mmap code when hugepages are not supported, such as in an LPAR
which is running Active Memory Sharing, we can oops the kernel.  This
fixes the oops being seen in this path.

oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: nfs(N) lockd(N) nfs_acl(N) sunrpc(N) ipv6(N) fuse(N) loop(N)
dm_mod(N) sg(N) ibmveth(N) sd_mod(N) crc_t10dif(N) ibmvscsic(N)
scsi_transport_srp(N) scsi_tgt(N) scsi_mod(N)
Supported: No
NIP: c000000000038d60 LR: c00000000003945c CTR: c0000000000393f0
REGS: c000000077e7b830 TRAP: 0300   Tainted: G
(2.6.27.5-bz50170-2-ppc64)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 44000448  XER: 20000001
DAR: c000002000af90a8, DSISR: 0000000040000000
TASK = c00000007c1b8600[4019] 'hugemmap01' THREAD: c000000077e78000 CPU: 6
GPR00: 0000001fffffffe0 c000000077e7bab0 c0000000009a4e78 0000000000000000
GPR04: 0000000000010000 0000000000000001 00000000ffffffff 0000000000000001
GPR08: 0000000000000000 c000000000af90c8 0000000000000001 0000000000000000
GPR12: 000000000000003f c000000000a73880 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000010000
GPR20: 0000000000000000 0000000000000003 0000000000010000 0000000000000001
GPR24: 0000000000000003 0000000000000000 0000000000000001 ffffffffffffffb5
GPR28: c000000077ca2e80 0000000000000000 c00000000092af78 0000000000010000
NIP [c000000000038d60] .slice_get_unmapped_area+0x6c/0x4e0
LR [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80
Call Trace:
[c000000077e7bbc0] [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80
[c000000077e7bc30] [c000000000107e30] .get_unmapped_area+0x64/0xd8
[c000000077e7bcb0] [c00000000010b140] .do_mmap_pgoff+0x140/0x420
[c000000077e7bd80] [c00000000000bf5c] .sys_mmap+0xc4/0x140
[c000000077e7be30] [c0000000000086b4] syscall_exit+0x0/0x40
Instruction dump:
fac1ffb0 fae1ffb8 fb01ffc0 fb21ffc8 fb41ffd0 fb61ffd8 fb81ffe0 fbc1fff0
fbe1fff8 f821fef1 f8c10158 f8e10160 <7d49002e> f9010168 e92d01b0 eb4902b0

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-16 13:48:18 +11:00
James Morris
ec98ce480a Merge branch 'master' into next
Conflicts:
	fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris <jmorris@namei.org>
2008-12-04 17:16:36 +11:00
Kumar Gala
0186f47e70 powerpc: Use RCU based pte freeing mechanism for all powerpc
Refactor the RCU based pte free code that was used on ppc64 to be used
on all powerpc.

Additionally refactor pte_free() & pte_free_kernel() into common code
between ppc32 & ppc64.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-03 20:46:35 +11:00
Kumar Gala
f4f3a1261a powerpc: hash_page_sync should only be used on SMP & STD_MMU_32
Clean up the ifdefs so we only use hash_page_sync if we have
CONFIG_SMP && CONFIG_PPC_STD_MMU_32.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-03 20:46:35 +11:00
Paul Mackerras
5274918855 Merge branch 'merge' 2008-12-03 20:11:06 +11:00
Linus Torvalds
03cfdb86ac Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  powerpc: Fix system calls on Cell entered with XER.SO=1
  powerpc/cell: Fix GDB watchpoints, again
  powerpc/mpic: Don't reset affinity for secondary MPIC on boot
  powerpc/cell/axon-msi: Retry on missing interrupt
  powerpc: Fix boot freeze on machine with empty memory node
  powerpc: Fix IRQ assignment for some PCIe devices
  powerpc/spufs: Fix spinning in spufs_ps_fault on signal
  powerpc/mpc832x_rdb: fix swapped ethernet ids
  powerpc: Use generic PHY driver for Marvell 88E1111 PHY on GE Fanuc SBC610
  powerpc/85xx: L2 cache size wrong in 8572DS dts
  powerpc/virtex: Update defconfigs
  powerpc/52xx: update defconfigs
  xsysace: Fix driver to use resource_size_t instead of unsigned long
  powerpc/virtex: fix various format/casting printk mismatches
  powerpc/mpc5200: fix bestcomm Kconfig dependencies
  powerpc/44x: Fix 460EX/460GT machine check handling
  powerpc/40x: Limit allocable DRAM during early mapping
2008-11-30 16:44:18 -08:00
Dave Hansen
4a6186696e powerpc: Fix boot freeze on machine with empty memory node
I got a bug report about a distro kernel not booting on a particular
machine.  It would freeze during boot:

> ...
> Could not find start_pfn for node 1
> [boot]0015 Setup Done
> Built 2 zonelists in Node order, mobility grouping on.  Total pages: 123783
> Policy zone: DMA
> Kernel command line:
> [boot]0020 XICS Init
> [boot]0021 XICS Done
> PID hash table entries: 4096 (order: 12, 32768 bytes)
> clocksource: timebase mult[7d0000] shift[22] registered
> Console: colour dummy device 80x25
> console handover: boot [udbg0] -> real [hvc0]
> Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes)
> Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes)
> freeing bootmem node 0

I've reproduced this on 2.6.27.7.  It is caused by commit
8f64e1f2d1 ("powerpc: Reserve in bootmem
lmb reserved regions that cross NUMA nodes").

The problem is that Jon took a loop which was (in pseudocode):

	for_each_node(nid)
		NODE_DATA(nid) = careful_alloc(nid);
		setup_bootmem(nid);
		reserve_node_bootmem(nid);

and broke it up into:

	for_each_node(nid)
		NODE_DATA(nid) = careful_alloc(nid);
		setup_bootmem(nid);
	for_each_node(nid)
		reserve_node_bootmem(nid);

The issue comes in when the 'careful_alloc()' is called on a node with
no memory.  It falls back to using bootmem from a previously-initialized
node.  But, bootmem has not yet been reserved when Jon's patch is
applied.  It gives back bogus memory (0xc000000000000000) and pukes
later in boot.

The following patch collapses the loop back together.  It also breaks
the mark_reserved_regions_for_nid() code out into a function and adds
some comments.  I think a huge part of introducing this bug is because
for loop was too long and hard to read.

The actual bug fix here is the:

+		if (end_pfn <= node->node_start_pfn ||
+		    start_pfn >= node_end_pfn)
+			continue;

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-01 09:40:18 +11:00
Al Viro
4ea8fb9c1c powerpc set_huge_psize() false positive
called only from __init, calls __init.  Incidentally, it ought to be static
in file.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:35 -08:00
Robert Jennings
a6326e98a2 powerpc: Correct page-in counter for CMM with 64k pages
Linux will report the number of page-ins so that the hypervisor can
better determine partition memory pressure.  The hardware page size
and the OS page size can be different.  In the case where the hardware
page size is 4k and the OS is running with 64k pages the code in
commit 409001948d ("powerpc: Update
page-in counter for CMM") would under-report the number of pages.

This corrects the reporting to the hypervisor by incrementing the
page_in count by 1 << PAGE_FACTOR each time.

Reported-by: Andrew Theurer <habanero@linux.vnet.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-19 16:05:05 +11:00
David Howells
1330deb0f6 CRED: Wrap task credential accesses in the PowerPC arch
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:39 +11:00
Grant Erickson
5907630ffc powerpc/40x: Limit allocable DRAM during early mapping
If the size of DRAM is not an exact power of two, we may not have
covered DRAM in its entirety with large 16 and 4 MiB pages.  If that
is the case, we can get non-recoverable page faults when doing the
final PTE mappings for the non-large page PTEs.

Consequently, we restrict the top end of DRAM currently allocable
by updating '__initial_memory_limit_addr' so that calls to the LMB to
allocate PTEs for "tail" coverage with normal-sized pages (or other
reasons) do not attempt to allocate outside the allowed range.

Signed-off-by: Grant Erickson <gerickson@nuovations.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2008-11-13 10:10:56 -05:00
Jon Tollefson
7d4320f3d5 powerpc: Hugetlb pgtable cache access cleanup
Andrew Morton suggested that using a macro that makes an array
reference look like a function call makes it harder to understand the
code.

This therefore removes the huge_pgtable_cache(psize) macro and
replaces its uses with pgtable_cache[HUGE_PGTABLE_INDEX(psize)].

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-06 09:49:39 +11:00
Brian King
409001948d powerpc: Update page-in counter for CMM
A new field has been added to the VPA as a method for the client OS to
communicate to firmware the number of page-ins it is performing when
running collaborative memory overcommit.  The hypervisor will use this
information to better determine if a partition is experiencing memory
pressure and needs more memory allocated to it.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-05 22:08:28 +11:00
Jon Tollefson
4792adbac9 powerpc: Don't use a 16G page if beyond mem= limits
If mem= is used on the boot command line to limit memory then the memory block where a 16G page resides may not be available.

Thanks to Michael Ellerman for finding the problem.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-22 15:01:21 +11:00
Benjamin Herrenschmidt
a02efb906d Merge commit 'origin' into master
Manual merge of:

	arch/powerpc/Kconfig
	arch/powerpc/include/asm/page.h
2008-10-21 15:52:04 +11:00
Milton Miller
fe55249d17 powerpc: Always trim numa memory to lmb_end_of_DRAM()
numa_enforce_memory_limit tried to be smart and only call lmb_end_of_DRAM
when a memory limit was set via mem= on the command line.  However,
the early boot code will also limit memory added to the lmb system
when iommu=off is specified.  When this happens, the page allocator
is given pages not in the linear mapping and this results in a fatal
data reference to the unmapped page.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-21 15:19:12 +11:00
Jon Tollefson
e81703724a powerpc/numa: Make memory reserve code more robust
Adjust amount to reserve based on previous nodes for reserves spanning
multiple nodes. Check if the node active range is empty before attempting
to pass the reserve to bootmem.  In practice the range shouldn't be empty,
but to be sure we check.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-21 15:17:48 +11:00
Badari Pulavarty
71088785c6 mm: cleanup to make remove_memory() arch-neutral
There is nothing architecture specific about remove_memory().
remove_memory() function is common for all architectures which support
hotplug memory remove.  Instead of duplicating it in every architecture,
collapse them into arch neutral function.

[akpm@linux-foundation.org: fix the export]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:50:25 -07:00
David Gibson
f5ea64dcba powerpc: Get USE_STRICT_MM_TYPECHECKS working again
The typesafe version of the powerpc pagetable handling (with
USE_STRICT_MM_TYPECHECKS defined) has bitrotted again.  This patch
makes a bunch of small fixes to get it back to building status.

It's still not enabled by default as gcc still generates worse
code with it for some reason.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-14 10:35:27 +11:00
Jon Tollefson
8f64e1f2d1 powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes
If there are multiple reserved memory blocks via lmb_reserve() that are
contiguous addresses and on different NUMA nodes we are losing track of which
address ranges to reserve in bootmem on which node.  I discovered this
when I recently got to try 16GB huge pages on a system with more then 2 nodes.

When scanning the device tree in early boot we call lmb_reserve() with
the addresses of the 16G pages that we find so that the memory doesn't
get used for something else.  For example the addresses for the pages
could be 4000000000, 4400000000, 4800000000, 4C00000000, etc - 8 pages,
one on each of eight nodes.  In the lmb after all the pages have been
reserved it will look something like the following:

lmb_dump_all:
    memory.cnt            = 0x2
    memory.size           = 0x3e80000000
    memory.region[0x0].base       = 0x0
                      .size     = 0x1e80000000
    memory.region[0x1].base       = 0x4000000000
                      .size     = 0x2000000000
    reserved.cnt          = 0x5
    reserved.size         = 0x3e80000000
    reserved.region[0x0].base       = 0x0
                      .size     = 0x7b5000
    reserved.region[0x1].base       = 0x2a00000
                      .size     = 0x78c000
    reserved.region[0x2].base       = 0x328c000
                      .size     = 0x43000
    reserved.region[0x3].base       = 0xf4e8000
                      .size     = 0xb18000
    reserved.region[0x4].base       = 0x4000000000
                      .size     = 0x2000000000

The reserved.region[0x4] contains the 16G pages.  In
arch/powerpc/mm/num.c: do_init_bootmem() we loop through each of the
node numbers looking for the reserved regions that belong to the
particular node.  It is not able to identify region 0x4 as being a part
of each of the 8 nodes.  It is assuming that a reserved region is only
on a single node.

This patch takes out the reserved region loop from inside
the loop that goes over each node.  It looks up the active region containing
the start of the reserved region.  If it extends past that active region then
it adjusts the size and gets the next active region containing it.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-10 15:55:19 +11:00
Roland Dreier
a880e76233 powerpc: Avoid integer overflow in page_is_ram()
Commit 8b150478 ("ppc: make phys_mem_access_prot() work with pfns
instead of addresses") fixed page_is_ram() in arch/ppc to avoid overflow
for addresses above 4G on 32-bit kernels.  However arch/powerpc's
page_is_ram() is missing the same fix -- it computes a physical address
by doing pfn << PAGE_SHIFT, which overflows if pfn corresponds to a page
above 4G.

In particular this causes pages above 4G to be mapped with the wrong
caching attribute; for example many ppc440-based SoCs have PCI space
above 4G, and mmap()ing MMIO space may end up with a mapping that has
caching enabled.

Fix this by working with the pfn and avoiding the conversion to
physical address that causes the overflow.  This patch compares the
pfn to max_pfn, which is a semantic change from the old code -- that
code compared the physical address to high_memory, which corresponds
to max_low_pfn.  However, I think that was is another bug, since
highmem pages are still RAM.

Reported-by: vb <vb@vsbe.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-07 14:26:18 +11:00
Becky Bruce
4ee7084eb1 POWERPC: Allow 32-bit hashed pgtable code to support 36-bit physical
This rearranges a bit of code, and adds support for
36-bit physical addressing for configs that use a
hashed page table.  The 36b physical support is not
enabled by default on any config - it must be
explicitly enabled via the config system.

This patch *only* expands the page table code to accomodate
large physical addresses on 32-bit systems and enables the
PHYS_64BIT config option for 86xx.  It does *not*
allow you to boot a board with more than about 3.5GB of
RAM - for that, SWIOTLB support is also required (and
coming soon).

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-09-24 16:29:44 -05:00
Becky Bruce
82331ab15f powerpc/85xx: fix build warning, remove silly cast
This fixes a build warning when PHYS_64BIT is enabled, and removes an
unnecessary cast to phys_addr_t (the variable being cast is already
a phys_addr_t)

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-09-16 10:01:35 -05:00
David Gibson
0b26425ce1 powerpc: Clean up hugepage pagetable allocation for powerpc with 16G pages
There is a small bug in the handling of 16G hugepages recently added
to the kernel.  This doesn't cause a crash or other user-visible
problems, but it does mean that more levels of pagetable are allocated
than makes sense for 16G pages.  The hugepage pagetables for the 16G
pages are allocated much lower in the pagetable tree than they should
be, with the intervening levels allocated with full pmd and pud pages
which will only ever have one entry filled in.

This corrects this problem, at the same time cleaning up the handling
of which level 64k versus 16M hugepage pagetables are allocated at.
The new way of formatting the tests should be more robust against
changes in pagetable structure, or any newly added hugepage sizes.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-15 11:08:47 -07:00
Becky Bruce
aaf4a9b0f7 powerpc: Rename PTE_SIZE to HPTE_SIZE
It's the size of the hardware PTE; make that clear in the name.

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-15 11:08:42 -07:00
Paul Mackerras
549e8152de powerpc: Make the 64-bit kernel as a position-independent executable
This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as
a position-independent executable (PIE) when it is set.  This involves
processing the dynamic relocations in the image in the early stages of
booting, even if the kernel is being run at the address it is linked at,
since the linker does not necessarily fill in words in the image for
which there are dynamic relocations.  (In fact the linker does fill in
such words for 64-bit executables, though not for 32-bit executables,
so in principle we could avoid calling relocate() entirely when we're
running a 64-bit kernel at the linked address.)

The dynamic relocations are processed by a new function relocate(addr),
where the addr parameter is the virtual address where the image will be
run.  In fact we call it twice; once before calling prom_init, and again
when starting the main kernel.  This means that reloc_offset() returns
0 in prom_init (since it has been relocated to the address it is running
at), which necessitated a few adjustments.

This also changes __va and __pa to use an equivalent definition that is
simpler.  With the relocatable kernel, PAGE_OFFSET and MEMORY_START are
constants (for 64-bit) whereas PHYSICAL_START is a variable (and
KERNELBASE ideally should be too, but isn't yet).

With this, relocatable kernels still copy themselves down to physical
address 0 and run there.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-15 11:08:38 -07:00
Chandru
cf00085d80 powerpc: Add support for dynamic reconfiguration memory in kexec/kdump kernels
Kdump kernel needs to use only those memory regions that it is allowed
to use (crashkernel, rtas, tce, etc.).  Each of these regions have
their own sizes and are currently added under 'linux,usable-memory'
property under each memory@xxx node of the device tree.

The ibm,dynamic-memory property of ibm,dynamic-reconfiguration-memory
node (on POWER6) now stores in it the representation for most of the
logical memory blocks with the size of each memory block being a
constant (lmb_size).  If one or more or part of the above mentioned
regions lie under one of the lmb from ibm,dynamic-memory property,
there is a need to identify those regions within the given lmb.

This makes the kernel recognize a new 'linux,drconf-usable-memory'
property added by kexec-tools.  Each entry in this property is of the
form of a count followed by that many (base, size) pairs for the above
mentioned regions.  The number of cells in the count value is given by
the #size-cells property of the root node.

Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-15 11:07:58 -07:00
Paul Mackerras
7e392f8c29 Merge branch 'linux-2.6' 2008-09-10 11:36:13 +10:00
Paul Mackerras
9e88ba4e45 powerpc: Only make kernel text pages of linear mapping executable
Commit bc033b63bb ("powerpc/mm: Fix
attribute confusion with htab_bolt_mapping()") moved the check for
whether we should make pages of the linear mapping executable from
htab_bolt_mapping into its callers, including htab_initialize.
A side-effect of this is that the decision is now made once for
each contiguous section in the LMB array rather than for each page
individually.  This can often mean that the whole of the linear
mapping ends up being executable.

This reverts to the previous behaviour, where individual pages are
checked for being part of the kernel text or not, by moving the check
back down into htab_bolt_mapping.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-03 20:53:22 +10:00
Tony Breeds
e16a9c0990 powerpc: Guard htab_dt_scan_hugepage_blocks appropriately
htab_dt_scan_hugepage_blocks is only used when CONFIG_HUGETLB_PAGE is
defined, so guard the declaration likewise.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-20 16:34:57 +10:00
Benjamin Herrenschmidt
bc033b63bb powerpc/mm: Fix attribute confusion with htab_bolt_mapping()
The function htab_bolt_mapping() is used to create permanent
mappings in the MMU hash table, for example, in order to create
the linear mapping of vmemmap.  It's also used by early boot
ioremap (before mem_init_done).

However, the way ioremap uses it is incorrect as it passes it the
protection flags in the "linux PTE" form while htab_bolt_mapping()
expects them in the hash table format.  This is made more confusing by
the fact that some of those flags are actually in the same position in
both cases.

This fixes it all by making htab_bolt_mapping() take normal linux
protection flags instead, and use a little helper to convert them to
htab flags. Callers can now use the usual PAGE_* definitions safely.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

 arch/powerpc/include/asm/mmu-hash64.h |    2 -
 arch/powerpc/mm/hash_utils_64.c       |   65 ++++++++++++++++++++--------------
 arch/powerpc/mm/init_64.c             |    9 +---
 3 files changed, 44 insertions(+), 32 deletions(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-11 10:09:56 +10:00
Tony Breeds
c7c8eede27 powerpc: Force printing of 'total_memory' to unsigned long long
total_memory is a 'phys_addr_t', Which can be either 64 or 32 bits.
Force printing as unsigned long long to silence the warning.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-04 13:18:17 +10:00
Tony Breeds
fb61063587 powerpc: Fix compiler warning in arch/powerpc/mm/mem.c
Explicitly cast to unsigned long long, rather than u64.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-04 13:18:17 +10:00
Stephen Rothwell
b8b572e101 powerpc: Move include files to arch/powerpc/include/asm
from include/asm-powerpc.  This is the result of a

mkdir arch/powerpc/include/asm
git mv include/asm-powerpc/* arch/powerpc/include/asm

Followed by a few documentation/comment fixups and a couple of places
where <asm-powepc/...> was being used explicitly.  Of the latter only
one was outside the arch code and it is a driver only built for powerpc.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-04 12:02:00 +10:00
Nick Piggin
ce0ad7f095 powerpc/mm: Lockless get_user_pages_fast() for 64-bit (v3)
Implement lockless get_user_pages_fast for 64-bit powerpc.

Page table existence is guaranteed with RCU, and speculative page references
are used to take a reference to the pages without having a prior existence
guarantee on them.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-30 15:26:54 +10:00
Benjamin Herrenschmidt
00df438e89 powerpc: Disable 64K hugetlb support when doing 64K SPU mappings
The 64K SPU local store mapping feature is incompatible with the
64K huge pages support due to the inability of some parts of
the memory management to differenciate between them while they
use a different page table format.

For now, disable 64K huge pages when CONFIG_SPU_FS_64K_LS,
in the long run, this can be fixed by making this feature use
the hugetlb page table format.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-28 16:30:53 +10:00
Johannes Weiner
bda2fa5355 powerpc: use generic show_mem()
Remove arch-specific show_mem() in favor of the generic version.

This also removes the following redundant information display:

	- pages in swapcache, printed by show_swap_cache_info()

where show_mem() calls show_free_areas(), which calls
show_swap_cache_info().

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:11 -07:00
Alexey Dobriyan
51cc50685a SL*B: drop kmem cache argument from constructor
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:07 -07:00
Luis Machado
d6a61bfc06 powerpc: BookE hardware watchpoint support
This patch implements support for HW based watchpoint via the
DBSR_DAC (Data Address Compare) facility of the BookE processors.

It does so by interfacing with the existing DABR breakpoint code
and adding the necessary bits and pieces for the new bits to
be properly set or cleared

Signed-off-by: Luis Machado <luisgpm@br.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-25 15:44:39 +10:00
Jon Tollefson
0d9ea75443 powerpc: support multiple hugepage sizes
Instead of using the variable mmu_huge_psize to keep track of the huge
page size we use an array of MMU_PAGE_* values.  For each supported huge
page size we need to know the hugepte_shift value and have a
pgtable_cache.  The hstate or an mmu_huge_psizes index is passed to
functions so that they know which huge page size they should use.

The hugepage sizes 16M and 64K are setup(if available on the hardware) so
that they don't have to be set on the boot cmd line in order to use them.
The number of 16G pages have to be specified at boot-time though (e.g.
hugepagesz=16G hugepages=5).

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Jon Tollefson
91224346aa powerpc: define support for 16G hugepages
The huge page size is defined for 16G pages.  If a hugepagesz of 16G is
specified at boot-time then it becomes the huge page size instead of the
default 16M.

The change in pgtable-64K.h is to the macro pte_iterate_hashed_subpages to
make the increment to va (the 1 being shifted) be a long so that it is not
shifted to 0.  Otherwise it would create an infinite loop when the shift
value is for a 16G page (when base page size is 64K).

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Jon Tollefson
658013e93e powerpc: scan device tree for gigantic pages
The 16G huge pages have to be reserved in the HMC prior to boot.  The
location of the pages are placed in the device tree.  This patch adds code
to scan the device tree during very early boot and save these page
locations until hugetlbfs is ready for them.

Acked-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Jon Tollefson
ec4b2c0c83 powerpc: function to allocate gigantic hugepages
The 16G page locations have been saved during early boot in an array.  The
alloc_bootmem_huge_page() function adds a page from here to the
huge_boot_pages list.

Acked-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Andi Kleen
ceb8687961 hugetlb: introduce pud_huge
Straight forward extensions for huge pages located in the PUD instead of
PMDs.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:18 -07:00
Andi Kleen
a551643895 hugetlb: modular state for hugetlb page size
The goal of this patchset is to support multiple hugetlb page sizes.  This
is achieved by introducing a new struct hstate structure, which
encapsulates the important hugetlb state and constants (eg.  huge page
size, number of huge pages currently allocated, etc).

The hstate structure is then passed around the code which requires these
fields, they will do the right thing regardless of the exact hstate they
are operating on.

This patch adds the hstate structure, with a single global instance of it
(default_hstate), and does the basic work of converting hugetlb to use the
hstate.

Future patches will add more hstate structures to allow for different
hugetlbfs mounts to have different page sizes.

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:17 -07:00
Jan Beulich
42b7772812 mm: remove double indirection on tlb parameter to free_pgd_range() & Co
The double indirection here is not needed anywhere and hence (at least)
confusing.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Benjamin Herrenschmidt
a1f242ff46 powerpc ioremap_prot
This adds ioremap_prot and pte_pgprot() so that one can extract protection
bits from a PTE and use them to ioremap_prot() (in order to support ptrace
of VM_IO | VM_PFNMAP as per Rik's patch).

This moves a couple of flag checks around in the ioremap implementations
of arch/powerpc.  There's a side effect of allowing non-cacheable and
non-guarded mappings on ppc32 which before would always have _PAGE_GUARDED
set whenever _PAGE_NO_CACHE is.

(standard ioremap will still set _PAGE_GUARDED, but ioremap_prot will be
capable of setting such a non guarded mapping).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Johannes Weiner
b61bfa3c46 mm: move bootmem descriptors definition to a single place
There are a lot of places that define either a single bootmem descriptor or an
array of them.  Use only one central array with MAX_NUMNODES items instead.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:14 -07:00
Benjamin Herrenschmidt
84c3d4aaec Merge commit 'origin/master'
Manual merge of:

	arch/powerpc/Kconfig
	arch/powerpc/kernel/stacktrace.c
	arch/powerpc/mm/slice.c
	arch/ppc/kernel/smp.c
2008-07-16 11:07:59 +10:00
Stefan Roese
2bf3016f89 powerpc: Fix problems with 32bit PPC's running with >= 4GB of RAM
This patch enables 32bit PPC's (with 36bit physical address space, e.g.
IBM/AMCC PPC44x) to run with >= 4GB of RAM. Mostly its just replacing types
(unsigned long -> phys_addr_t).

Tested on an AMCC Katmai with 4GB of DDR2.

Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2008-07-09 14:13:01 -04:00
Benjamin Herrenschmidt
1bc54c0311 powerpc: rework 4xx PTE access and TLB miss
This is some preliminary work to improve TLB management on SW loaded
TLB powerpc platforms. This introduce support for non-atomic PTE
operations in pgtable-ppc32.h and removes write back to the PTE from
the TLB miss handlers. In addition, the DSI interrupt code no longer
tries to fixup write permission, this is left to generic code, and
_PAGE_HWWRITE is gone.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2008-07-09 13:36:17 -04:00
Stephen Rothwell
392096e98f generic-ipi: fix linux-next tree build failure
Today's linux-next build (powerpc ppc64_defconfig) failed like this:

arch/powerpc/mm/tlb_64.c: In function 'pgtable_free_now':
arch/powerpc/mm/tlb_64.c:66: error: too many arguments to function 'smp_call_function'
arch/powerpc/kernel/machine_kexec_64.c: In function 'kexec_prepare_cpus':
arch/powerpc/kernel/machine_kexec_64.c:175: error: too many arguments to function 'smp_call_function'

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <linuxppc-dev@ozlabs.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-03 09:25:42 +02:00
Nathan Fontenot
0db9360aaa powerpc/pseries: Update numa association of hotplug memory add for drconf memory
Update the association of a memory section with a numa node that
occurs during hotplug add of a memory section.  This adds a check in
the hot_add_scn_to_nid() routine for the
ibm,dynamic-reconfiguration-memory node in the device tree.  If
present the new hot_add_drconf_scn_to_nid() routine is invoked, which
can properly parse the ibm,dynamic-reconfiguration-memory node of the
device tree and make the proper numa node associations.

This also introduces the valid_hot_add_scn() routine as a helper
function for code that is common to the hot_add_scn_to_nid() and
hot_add_drconf_scn_to_nid() routines.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-03 16:58:18 +10:00
Nathan Fontenot
8342681d3e powerpc/pseries: Split code into helper routines for drconf memory
This splits off several pieces of code that parse the
ibm,dynamic-reconfiguration-memory node of the device tree into separate
helper routines.  This is in preparation for the next commit that will
use these helper routines.  There are no functional changes in this patch.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-03 16:58:17 +10:00
Tony Breeds
db7f37de2c powerpc: Fix building of arch/powerpc/mm/mem.o when MEMORY_HOTPLUG=y and SPARSEMEM=n
Currently the kernel fails to build with the above config options with:
  CC      arch/powerpc/mm/mem.o
arch/powerpc/mm/mem.c: In function 'arch_add_memory':
arch/powerpc/mm/mem.c:130: error: implicit declaration of function 'create_section_mapping'

This explicitly includes asm/sparsemem.h in arch/powerpc/mm/mem.c and
moves the guards in include/asm-powerpc/sparsemem.h to protect the
SPARSEMEM specific portions only.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-03 16:58:07 +10:00
Dave Kleikamp
87e9ab13c3 powerpc: hash_huge_page() should get the WIMG bits from the lpte
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01 11:28:02 +10:00
Paul Mackerras
3a8247cc2c powerpc: Only demote individual slices rather than whole process
At present, if we have a kernel with a 64kB page size, and some
process maps something that has to be mapped with 4kB pages (such as a
cache-inhibited mapping on POWER5+, or the eHCA infiniband queue-pair
pages), we change the process to use 4kB pages everywhere.  This hurts
the performance of HPC programs that access eHCA from userspace.

With this patch, the kernel will only demote the slice(s) containing
the eHCA or cache-inhibited mappings, leaving the remaining slices
able to use 64kB hardware pages.

This also changes the slice_get_unmapped_area code so that it is
willing to place a 64k-page mapping into (or across) a 4k-page slice
if there is no better alternative, i.e. if the program specified
MAP_FIXED or if there is not sufficient space available in slices that
are either empty or already have 64k-page mappings in them.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-01 11:27:57 +10:00
Becky Bruce
316a405841 powerpc: Get rid of bitfields in ppc_bat struct
While working on the 36-bit physical support, I noticed that there
was exactly one line of code that actually referenced the bitfields.
So I got rid of them and redefined ppc_bat as a struct of 2 u32's:
batu and batl.  I also got rid of the previous union that held the
bitfield structs and a word representation of the batu/l values.

This seems like a nicer solution than adding in a bunch of
new bitfields to support extended bat addressing that would never
get used, and just leaving the struct as-is would have been
incomplete in the face of large physical addressing.

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-30 22:31:05 +10:00
Becky Bruce
7c5c4325d2 powerpc: Change BAT code to use phys_addr_t
Currently, the physical address is an unsigned long, but it should
be phys_addr_t in set_bat, [v/p]_mapped_by_bat.  Also, create a
macro that can convert a large physical address into the correct
format for programming the BAT registers.

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-30 22:31:03 +10:00
Benjamin Herrenschmidt
41743a4e34 powerpc: Free a PTE bit on ppc64 with 64K pages
This frees a PTE bit when using 64K pages on ppc64.  This is done
by getting rid of the separate _PAGE_HASHPTE bit.  Instead, we just test
if any of the 16 sub-page bits is set.  For non-combo pages (ie. real
64K pages), we set SUB0 and the location encoding in that field.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-30 22:30:53 +10:00
Paul Mackerras
e9a4b6a3f6 Merge branch 'linux-2.6' 2008-06-30 10:16:50 +10:00
Jens Axboe
15c8b6c1aa on_each_cpu(): kill unused 'retry' parameter
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:38 +02:00
Paul Mackerras
65ba6cdc83 [POWERPC] Clear sub-page HPTE present bits when demoting page size
When we demote a slice from 64k to 4k, and we are about to insert an
HPTE for a 4k subpage and we notice that there is an existing 64k
HPTE, we first invalidate that HPTE before inserting the new 4k
subpage HPTE.  Since the bits that encode which hash bucket the old
HPTE was in overlap with the bits that encode which of the 16 subpages
have HPTEs, we need to clear out the subpage HPTE-present bits before
starting to insert HPTEs for the 4k subpages.  If we don't do that, we
can erroneously think that a subpage already has an HPTE when it
doesn't.

That in itself wouldn't be such a problem except that when we go to
update the HPTE that we think is present on machines with a
hypervisor, the hypervisor can tell us that the HPTE we think is there
is actually there even though it isn't, which can lead to a process
getting stuck in a loop, continually faulting.  The reason for the
confusion is that the AVPN (abbreviated virtual page number) we are
looking for in the HPTE for a 4k subpage can actually match the AVPN
in a stale HPTE for another 64k page.  For example, the HPTE for
the 4k subpage at 0x84000f000 will be in the same hash bucket and have
the same AVPN as the HPTE for the 64k page at 0x8400f0000.

This fixes the code to clear out the subpage HPTE-present bits.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-18 21:40:43 +10:00
Paul Mackerras
8a3e1c670e Merge branch 'merge'
Conflicts:

	arch/powerpc/sysdev/fsl_soc.c
2008-06-09 12:19:41 +10:00
Nathan Lynch
0d5799449f [POWERPC] Make walk_memory_resource available with MEMORY_HOTPLUG=n
The ehea driver was recently changed[1] to use walk_memory_resource() to
detect the system's memory layout.  However, walk_memory_resource() is
available only when memory hotplug is enabled.  So CONFIG_EHEA was
made to depend on MEMORY_HOTPLUG [2], but it is inappropriate for a
network driver to have such a dependency.

Make the declaration of walk_memory_resource() and its powerpc
implementation (ehea is powerpc-specific) unconditionally available.

[1] 48cfb14f8b
    "ehea: Add DLPAR memory remove support"

[2] fb7b6ca2b6
    "ehea: Add dependency to Kconfig"

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-09 11:32:41 +10:00
Paul Mackerras
acf464817d Merge branch 'merge' into powerpc-next 2008-05-23 16:53:23 +10:00
David Gibson
46a7417963 [POWERPC] Fix __set_fixmap() for STRICT_MM_TYPECHECKS
__set_fixmap() in pgtable_32.c currently fails to compile if
STRICT_MM_TYPECHECKS is defined.  This fixes it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-23 16:15:32 +10:00
Adrian Bunk
d3d3d3cdb1 [POWERPC] powerpc/mm/hash_low_32.S: Remove CVS keyword
This removes a CVS keyword that wasn't updated for a long time from a
comment.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-20 09:34:18 +10:00
Paul Mackerras
fcff474ea5 Merge branch 'linux-2.6' into powerpc-next 2008-05-16 23:13:42 +10:00
Benjamin Herrenschmidt
cec08e7a94 [POWERPC] vmemmap fixes to use smaller pages
This changes vmemmap to use a different region (region 0xf) of the
address space, and to configure the page size of that region
dynamically at boot.

The problem with the current approach of always using 16M pages is that
it's not well suited to machines that have small amounts of memory such
as small partitions on pseries, or PS3's.

In fact, on the PS3, failure to allocate the 16M page backing vmmemmap
tends to prevent hotplugging the HV's "additional" memory, thus limiting
the available memory even more, from my experience down to something
like 80M total, which makes it really not very useable.

The logic used by my match to choose the vmemmap page size is:

 - If 16M pages are available and there's 1G or more RAM at boot,
   use that size.
 - Else if 64K pages are available, use that
 - Else use 4K pages

I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages)
and it seems to work fine.

Note that I intend to change the way we organize the kernel regions &
SLBs so the actual region will change from 0xf back to something else at
one point, as I simplify the SLB miss handler, but that will be for a
later patch.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-15 20:49:25 +10:00
Michael Ellerman
c884116ac3 [POWERPC] Remove duplicate variable definitions in mm/tlb_64.c
Somewhere along the way (e28f7faf05,
"Four level pagetables for ppc64") we ended up with duplicate
definitions for pte_freelist_cur and pte_freelist_force_free.
Somehow this compiles, but it would be better to just have one
definition for each.

The two definitions we end up with can be static too!

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:31:49 +10:00
Michael Ellerman
572fb578de [POWERPC] Move declaration of tce variables into mmu-hash64.h
... instead of having extern declarations in a .c file.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:31:47 +10:00
Michael Ellerman
09de9ff872 [POWERPC] Fix sparse warnings in arch/powerpc/mm
Make two vmemmap helpers static in init_64.c
Make stab variables static in stab.c
Make psize defs static in hash_utils_64.c

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:31:46 +10:00
Michael Ellerman
5f25f06529 [POWERPC] Move declaration of init_bootmem_done into system.h
... instead of having an extern declaration in a .c file.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:31:44 +10:00
Paul Mackerras
3b5750644b [POWERPC] Bolt in SLB entry for kernel stack on secondary cpus
This fixes a regression reported by Kamalesh Bulabel where a POWER4
machine would crash because of an SLB miss at a point where the SLB
miss exception was unrecoverable.  This regression is tracked at:

http://bugzilla.kernel.org/show_bug.cgi?id=10082

SLB misses at such points shouldn't happen because the kernel stack is
the only memory accessed other than things in the first segment of the
linear mapping (which is mapped at all times by entry 0 of the SLB).
The context switch code ensures that SLB entry 2 covers the kernel
stack, if it is not already covered by entry 0.  None of entries 0
to 2 are ever replaced by the SLB miss handler.

Where this went wrong is that the context switch code assumes it
doesn't have to write to SLB entry 2 if the new kernel stack is in the
same segment as the old kernel stack, since entry 2 should already be
correct.  However, when we start up a secondary cpu, it calls
slb_initialize, which doesn't set up entry 2.  This is correct for
the boot cpu, where we will be using a stack in the kernel BSS at this
point (i.e. init_thread_union), but not necessarily for secondary
cpus, whose initial stack can be allocated anywhere.  This doesn't
cause any immediate problem since the SLB miss handler will just
create an SLB entry somewhere else to cover the initial stack.

In fact it's possible for the cpu to go quite a long time without SLB
entry 2 being valid.  Eventually, though, the entry created by the SLB
miss handler will get overwritten by some other entry, and if the next
access to the stack is at an unrecoverable point, we get the crash.

This fixes the problem by making slb_initialize create a suitable
entry for the kernel stack, if we are on a secondary cpu and the stack
isn't covered by SLB entry 0.  This requires initializing the
get_paca()->kstack field earlier, so I do that in smp_create_idle
where the current field is initialized.  This also abstracts a bit of
the computation that mk_esid_data in slb.c does so that it can be used
in slb_initialize.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-02 15:00:45 +10:00
Geoff Levand
bbea346062 [POWERPC] Fix slb.c compile warnings
Arrange for a syntax check to always be done on the powerpc/mm/slb.c
DBG() macro by defining it to pr_debug() for non-debug builds.

Also, fix these related compile warnings:

  slb.c:273: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int
  slb.c:274: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int'

Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-02 15:00:44 +10:00
Badari Pulavarty
9d88a2eb6e [POWERPC] Provide walk_memory_resource() for powerpc
Provide walk_memory_resource() for 64-bit powerpc.  PowerPC maintains
logical memory region mapping in the lmb.memory structure.  Walk
through these structures and do the callbacks for the contiguous
chunks.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 15:57:53 +10:00
Jeremy Fitzhardinge
180c06efce hotplug-memory: make online_page() common
All architectures use an effectively identical definition of online_page(), so
just make it common code.  x86-64, ia64, powerpc and sh are actually
identical; x86-32 is slightly different.

x86-32's differences arise because it puts its hotplug pages in the highmem
zone.  We can handle this in the generic code by inspecting the page to see if
its in highmem, and update the totalhigh_pages count appropriately.  This
leaves init_32.c:free_new_highpage with a single caller, so I folded it into
add_one_highpage_init.

I also removed an incorrect comment referring to the NUMA case; any NUMA
details have already been dealt with by the time online_page() is called.

[akpm@linux-foundation.org: fix indenting]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:17 -07:00