set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().
This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes it possible to build kernels for PReP and/or CHRP
with ARCH=ppc by removing the (non-building) powermac support.
It's now also possible to select PReP and CHRP independently.
Powermac users should now build with ARCH=powerpc instead of
ARCH=ppc. (This does mean that it is no longer possible to
build a 32-bit kernel for a G5.)
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently 8xx fails to boot due to endless pagefaults.
Seems the bug is exposed by the change which avoids flushing the
TLB when not necessary (in case the pte has not changed), introduced
recently:
__handle_mm_fault():
entry = pte_mkyoung(entry);
if (!pte_same(old_entry, entry)) {
ptep_set_access_flags(vma, address, pte, entry, write_access);
update_mmu_cache(vma, address, entry);
lazy_mmu_prot_update(entry);
} else {
/*
* This is needed only for protection faults but the arch code
* is not yet telling us if this is a protection fault or not.
* This still avoids useless tlb flushes for .text page faults
* with threads.
*/
if (write_access)
flush_tlb_page(vma, address);
}
The "update_mmu_cache()" call was unconditional before, which caused the TLB
to be flushed by:
if (pfn_valid(pfn)) {
struct page *page = pfn_to_page(pfn);
if (!PageReserved(page)
&& !test_bit(PG_arch_1, &page->flags)) {
if (vma->vm_mm == current->active_mm) {
#ifdef CONFIG_8xx
/* On 8xx, cache control instructions (particularly
* "dcbst" from flush_dcache_icache) fault as write
* operation if there is an unpopulated TLB entry
* for the address in question. To workaround that,
* we invalidate the TLB here, thus avoiding dcbst
* misbehaviour.
*/
_tlbie(address);
#endif
__flush_dcache_icache((void *) address);
} else
flush_dcache_icache_page(page);
set_bit(PG_arch_1, &page->flags);
}
Which worked to due to pure luck: PG_arch_1 was always unset before, but
now it isnt.
The root of the problem are the changes against the 8xx TLB handlers introduced
during v2.6. What happens is the TLBMiss handlers load the zeroed pte into
the TLB, causing the TLBError handler to be invoked (thats two TLB faults per
pagefault), which then jumps to the generic MM code to setup the pte.
The bug is that the zeroed TLB is not invalidated (the same reason
for the "dcbst" misbehaviour), resulting in infinite TLBError faults.
The "two exception" approach requires a TLB flush (to nuke the zeroed TLB)
at each PTE update for correct behaviour:
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Changed jobs and the Freescale address is no longer valid.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
First step in pushing down the page_table_lock. init_mm.page_table_lock has
been used throughout the architectures (usually for ioremap): not to serialize
kernel address space allocation (that's usually vmlist_lock), but because
pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it.
Reverse that: don't lock or unlock init_mm.page_table_lock in any of the
architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take
and drop it when allocating a new one, to check lest a racing task already
did. Similarly no page_table_lock in vmalloc's map_vm_area.
Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle
user mms, which are converted only by a later patch, for now they have to lock
differently according to whether or not it's init_mm.
If sources get muddled, there's a danger that an arch source taking
init_mm.page_table_lock will be mixed with common source also taking it (or
neither take it). So break the rules and make another change, which should
break the build for such a mismatch: remove the redundant mm arg from
pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13).
Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64
used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to
pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64
map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free
took page_table_lock for no good reason.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the phys_mem_access_prot() function to take a pfn instead of an
address. This allows mmap64() to work on /dev/mem for addresses above 4G
on 32-bit architectures. We start with a pfn in mmap_mem(), so there's no
need to convert to an address; in fact, it's actively bad, since the
conversion can overflow when the address is above 4G.
Similarly fix the ppc32 page_is_ram() function to avoid a conversion to an
address by directly comparing to max_pfn. Working with max_pfn instead of
high_memory fixes page_is_ram() to give the right answer for highmem pages.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Here is a new patch that removes all notion of the pmac, prep,
chrp and openfirmware initialization sections, and then unifies
the sections.h files without those __pmac, etc, sections identifiers
cluttering things up.
Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This is a patch that I have had in my tree for ages. If init causes
an exception that raises a signal, such as a SIGSEGV, SIGILL or
SIGFPE, and it hasn't registered a handler for it, we don't deliver
the signal, since init doesn't get any signals that it doesn't have a
handler for. But that means that we just return to userland and
generate the same exception again immediately. With this patch we
print a message and kill init in this situation.
This is very useful when you have a bug in the kernel that means that
init doesn't get as far as executing its first instruction. :)
Without this patch the system hangs when it gets to starting the
userland init; with it you at least get a message giving you a clue
about what has gone wrong.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is the default configuration for Marvell EV64360BP board. It has been
tested on the board.
Signed-off-by: Lee Nicks <allinux@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
flush_dcache_icache_page() will be called on an instruction page fault. We
can't sleep in the fault handler, so use kmap_atomic() instead of just
kmap() for the Book-E case.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On 8xx, in the case where a pagefault happens for a process who's not
the owner of the vma in question (ptrace for instance), the flush
operation is performed via the physical address.
Unfortunately, that results in a strange, unexplainable "icbi"
instruction fault, most likely due to a CPU bug (see oops below).
Avoid that by flushing the page via its kernel virtual address.
Oops: kernel access of bad area, sig: 11 [#2]
NIP: C000543C LR: C000B060 SP: C0F35DF0 REGS: c0f35d40 TRAP: 0300 Not tainted
MSR: 00009022 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 10
DAR: 00000010, DSISR: C2000000
TASK = c0ea8430[761] 'gdbserver' THREAD: c0f34000
Last syscall: 26
GPR00: 00009022 C0F35DF0 C0EA8430 00F59000 00000100 FFFFFFFF 00F58000
00000001
GPR08: C021DAEF C0270000 00009032 C0270000 22044024 10025428 01000800
00000001
GPR16: 007FFF3F 00000001 00000000 7FBC6AC0 00F61022 00000001 C0839300
C01E0000
GPR24: 00CD0889 C082F568 3000AC18 C02A7A00 C0EA15C8 00F588A9 C02ACB00
C02ACB00
NIP [c000543c] __flush_dcache_icache_phys+0x38/0x54
LR [c000b060] flush_dcache_icache_page+0x20/0x30
Call trace:
[c000b154] update_mmu_cache+0x7c/0xa4
[c005ae98] do_wp_page+0x460/0x5ec
[c005c8a0] handle_mm_fault+0x7cc/0x91c
[c005ccec] get_user_pages+0x2fc/0x65c
[c0027104] access_process_vm+0x9c/0x1d4
[c00076e0] sys_ptrace+0x240/0x4a4
[c0002bd0] ret_from_syscall+0x0/0x44
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The proposed _tlbie call at update_mmu_cache() is safe because:
Addresses for which update_mmu_cache() gets invocated are never inside the
static kernel virtual mapping, meaning that there is no risk for the
_tlbie() here to be thrashing the pinned entry, as Dan suspected.
The intermediate TLB state in which this bug can be triggered is not
visible by userspace or any other contexts, except the page fault handling
path. So there is no need to worry about userspace dcbxxx users.
The other solution to this is to avoid dcbst misbehaviour in the first
place, which involves changing in-kernel "dcbst" callers to use 8xx
specific SPR's.
Summary:
On 8xx, cache control instructions (particularly "dcbst" from
flush_dcache_icache) fault as write operation if there is an unpopulated
TLB entry for the address in question. To workaround that, we invalidate
the TLB here, thus avoiding dcbst misbehaviour.
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Continue the Good Fight: Limit bootmem.h include creep.
Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The e200 core is a Book-E core (similar to e500) that has a unified L1 cache
and is not cache coherent on the bus. The e200 core also adds a separate
exception level for debug exceptions. Part of this patch helps to cleanup a
few cases that are true for all Freescale Book-E parts, not just e500.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch kills the whole embedded System.map mecanism and the
bootloader-passed System.map that was used to provide symbol resolution in
xmon. Instead, xmon now uses kallsyms like ppc64 does.
No hurry getting that in Linus tree, let it be tested in -mm for a while
first and make sure it doesn't break various embedded configs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Made the number of TLB CAM entries private and converted the board
consumers to use num_tlbcam_entries which is setup at boot time from
configuration registers. This way the only consumers of the #define
NUM_TLBCAMS are the arrays used to manage the TLB.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove PG_highmem, to save a page flag. Use is_highmem() instead. It'll
generate a little more code, but we don't use PageHigheMem() in many places.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On ppc32, the platform code can supply a "progress" function that is
used to show progress through the boot. These functions are usually
in an init section and so can't be called after the init pages are
freed. Now that the cpu bringup code can be called after the system
is booted (for hotplug cpu) we can get the situation where the
progress function can be called after boot. The simple fix is to set
the progress function pointer to NULL when the init pages are freed,
and that is what this patch does (note that all callers already check
whether the function pointer is NULL before trying to call it).
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
CONFIG_PTE_64BIT & CONFIG_PHYS_64BIT are not currently consistently used in
the code base. Fixed up the usage such that CONFIG_PTE_64BIT is used when we
have a 64-bit PTE regardless of physical address width. CONFIG_PHYS_64BIT is
used if the physical address width is larger than 32-bits, regardless of PTE
size.
These changes required a few sub-arch specific ifdef's to be fixed and the
introduction of a physical address format string.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!