Commit Graph

33 Commits

Author SHA1 Message Date
Thomas Gleixner
76ebd0548d x86: introduce page pool in cpa
DEBUG_PAGEALLOC was not possible on 64-bit due to its early-bootup
hardcoded reliance on PSE pages, and the unrobustness of the runtime
splitup of large pages. The splitup ended in recursive calls to
alloc_pages() when a page for a pte split was requested.

Avoid the recursion with a preallocated page pool, which is used to
split up large mappings and gets refilled in the return path of
kernel_map_pages after the split has been done. The size of the page
pool is adjusted to the available memory.

This part just implements the page pool and the initialization w/o
using it yet.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09 23:24:09 +01:00
Ian Campbell
551889a6e2 x86: construct 32-bit boot time page tables in native format.
Specifically the boot time page tables in a CONFIG_X86_PAE=y enabled
kernel are in PAE format.

early_ioremap is updated to use the standard page table accessors.

Clear any mappings beyond max_low_pfn from the boot page tables in
native_pagetable_setup_start because the initial mappings can extend
beyond the range of physical memory and into the vmalloc area.

Derived from patches by Eric Biederman and H. Peter Anvin.

[ jeremy@goop.org: PAE swapper_pg_dir needs to be page-sized fix ]

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-09 23:24:09 +01:00
H. Peter Anvin
f832ff18e8 x86: use _ASM_EXTABLE macro in arch/x86/mm/init_32.c
Use the _ASM_EXTABLE macro from <asm/asm.h>, instead of open-coding
__ex_table entires in arch/x86/mm/init_32.c.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:47:58 +01:00
Rafael J. Wysocki
a6eb84bc1e suspend: cleanup reference to swsusp_pg_dir[]
swsusp_pg_dir[] is used for suspend, but not for hibernation.
clean-up the ifdefs which worked by accident, while implying the opposite.
Delete the __nosavedata, which also implied the opposite.

Some day we may optimize CONFIG_ACPI_SLEEP to build minimal kernels
for just hibernate or just suspend but not both,
but today isn't that day.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-01 18:30:59 -05:00
Jeremy Fitzhardinge
6194ba6ff6 x86: don't special-case pmd allocations as much
In x86 PAE mode, stop treating pmds as a special case.  Previously
they were always allocated and freed with the pgd.  The modifies the
code to be the same as 64-bit mode, where they are allocated on
demand.

This is a step on the way to unifying 32/64-bit pagetable allocation
as much as possible.

There is a complicating wart, however.  When you install a new
reference to a pmd in the pgd, the processor isn't guaranteed to see
it unless you reload cr3.  Since reloading cr3 also has the
side-effect of flushing the tlb, this is an expense that we want to
avoid whereever possible.

This patch simply avoids reloading cr3 unless the update is to the
current pagetable.  Later patches will optimise this further.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:11 +01:00
Ingo Molnar
d7d119d777 x86: arch/x86/mm/init_32.c printk fixes
printk fixes. NOP in terms of functionality, but strings got
a bit larger due to the KERN_ markers that were added.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:10 +01:00
Ingo Molnar
8550eb9982 x86: arch/x86/mm/init_32.c cleanup
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:10 +01:00
Ingo Molnar
86f03989d9 x86: cpa: fix the self-test
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:09 +01:00
Ingo Molnar
ee01f1122c x86: init memory debugging
debug incorrect/late access to init memory, by permanently unmapping
the init memory ranges. Depends on CONFIG_DEBUG_PAGEALLOC=y.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:09 +01:00
Arjan van de Ven
edeed30589 x86: add testcases for RODATA and NX protections/attributes
Latest update; I now have 4 NX tests, but 2 fail so they're #if 0'd.
I also cleaned up the NX test code quite a bit, and got rid of the ugly
exception table sorting stuff.

From: Arjan van de Ven <arjan@linux.intel.com>

This patch adds testcases for the CONFIG_DEBUG_RODATA configuration option
as well as the NX CPU feature/mappings. Both testcases can move to tests/
once that patch gets merged into mainline.
(I'm half considering moving the rodata test into mm/init.c but I'll
wait with that until init.c is unified)

As part of this I had to fix a not-quite-right alignment in the vmlinux.lds.h
for the RODATA sections, which lead to 1 page less being marked read only.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:08 +01:00
Arjan van de Ven
3c1df68b84 x86: make sure initmem is writable
When we free initmem, various rodata and CPA checks may have left
memory read only.. this patch ensures that the memory is writable
before we free it.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:07 +01:00
Thomas Gleixner
d7c8f21a8c x86: cpa: move flush to cpa
The set_memory_* and set_pages_* family of API's currently requires the
callers to do a global tlb flush after the function call; forgetting this is
a very nasty deathtrap. This patch moves the global tlb flush into
each of the callers

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:07 +01:00
Thomas Gleixner
5f5192b9fe x86: move page_is_ram() function
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:06 +01:00
Arjan van de Ven
6d238cc4dc x86: convert CPA users to the new set_page_ API
This patch converts various users of change_page_attr() to the new,
more intent driven set_page_*/set_memory_* API set.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:34:06 +01:00
Andi Kleen
934d15854d x86: remove set_kernel_exec()
The SMP trampoline always runs in real mode, so making it executable
in the page tables doesn't make much sense because it executes
before page tables are set up. That was the only user of
set_kernel_exec(). Remove set_kernel_exec().

Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:53 +01:00
Andi Kleen
c93c82bbea x86: shrink __PAGE_KERNEL/__PAGE_KERNEL_EXEC on non PAE kernels
No need to make it 64bit there.

Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:50 +01:00
Huang, Ying
beacfaac3f x86 32-bit boot: rename bt_ioremap() to early_ioremap()
This patch renames bt_ioremap to early_ioremap, which is used in
x86_64. This makes it easier to merge i386 and x86_64 usage.

[ mingo@elte.hu: fix ]

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:44 +01:00
Huang, Ying
0947b2f31c i386 boot: replace boot_ioremap with enhanced bt_ioremap - enhance bt_ioremap
This patch makes it possible for bt_ioremap() to be used before
paging_init(), via providing an early implementation of set_fixmap()
that can be used before paging_init().

This way boot_ioremap() can be replaced by bt_ioremap().

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:44 +01:00
Ingo Molnar
f0646e43ac x86: return the page table level in lookup_address()
based on this patch from Andi Kleen:

|  Subject: CPA: Return the page table level in lookup_address()
|  From: Andi Kleen <ak@suse.de>
|
|  Needed for the next change.
|
|  And change all the callers.

and ported it to x86.git.

Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:43 +01:00
Andi Kleen
4c3c4b4513 x86: clean up pte_exec
- Rename it to pte_exec() from pte_exec_kernel(). There is nothing
kernel specific in there.
- Move it into the common file because _PAGE_NX is 0 on !PAE and then
pte_exec() will be always evaluate to true.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:42 +01:00
Andi Kleen
0c42f39276 c_p_a(): do a simple self test at boot
When CONFIG_DEBUG_RODATA is enabled undo the ro mapping and redo it again.
This gives some simple testing for change_page_attr().

Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:42 +01:00
Jeremy Fitzhardinge
a5a19c63f4 x86: demacro asm-x86/pgalloc_32.h
Convert macros into inline functions, for better type-checking.

This patch required a little bit of fiddling with headers in order to
make __(pte|pmd)_free_tlb inline rather than macros.
asm-generic/tlb.h includes asm/pgalloc.h, though it doesn't directly
use any pgalloc definitions.  I removed this include to avoid an
include cycle, but it may cause secondary compile failures by things
depending on the indirect inclusion; arch/x86/mm/hugetlbpage.c was one
such place; there may be others.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:39 +01:00
Jeremy Fitzhardinge
6c435456dc x86: add mm parameter to paravirt_alloc_pd
Add mm to paravirt_alloc_pd, partly to make it consistent with
paravirt_alloc_pt, and because later changes will make use of it.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:39 +01:00
Jeremy Fitzhardinge
6fdc05d479 x86: unify pgtable accessors which use
Make users of supported_pte_mask common.  This has the side-effect of
introducing the variable for 32-bit non-PAE, but I think its a pretty
small cost to simplify the code.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:32:57 +01:00
Huang, Ying
2215e69d2c x86 boot: use E820 memory map on EFI 32 platform
Because the EFI memory map are converted to e820 memory map in bootloader, the
EFI memory map handling code is removed to clean up.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:19 +01:00
Jeremy Fitzhardinge
f3f20de87c x86: clean up mm/init_32.c
Some code reformatting in init_32.c.  No functional change.

Signed-off-by: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:09 +01:00
Ingo Molnar
23be8c7ddf x86: fix boot crash on HIGHMEM4G && SPARSEMEM
Denys Fedoryshchenko reported a bootup crash when he upgraded
his system from 3GB to 4GB RAM:

   http://lkml.org/lkml/2008/1/7/9

the bug is due to HIGHMEM4G && SPARSEMEM kernels making pfn_to_page()
to return an invalid pointer when the pfn is in a memory hole. The
256 MB PCI aperture at the end of RAM was not mapped by sparsemem,
and hence the pfn was not valid. But set_highmem_pages_init() iterated
this range without checking the pfn's validity first.

this bug was probably present in the sparsemem code ever since sparsemem
has been introduced in v2.6.13. It was masked due to HIGHMEM64G using
larger memory regions in sparsemem_32.h:

 #ifdef CONFIG_X86_PAE
 #define SECTION_SIZE_BITS       30
 #define MAX_PHYSADDR_BITS       36
 #define MAX_PHYSMEM_BITS        36
 #else
 #define SECTION_SIZE_BITS       26
 #define MAX_PHYSADDR_BITS       32
 #define MAX_PHYSMEM_BITS        32
 #endif

which creates 1GB sparsemem regions instead of 64MB sparsemem regions.
So in practice we only ever created true sparsemem holes on x86 with
HIGHMEM4G - but that was rarely used by distros.

( btw., we could probably save 2MB of mem_map[]s on X86_PAE if we reduced
  the sparsemem region size to 256 MB. )

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-15 16:44:37 +01:00
Linus Torvalds
d20ead9e86 Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86
* ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86: (114 commits)
  x86: delete vsyscall files during make clean
  kbuild: fix typo SRCARCH in find_sources
  x86: fix kernel rebuild due to vsyscall fallout
  .gitignore update for x86 arch
  x86: unify include/asm/debugreg_32/64.h
  x86: unify include/asm/unwind_32/64.h
  x86: unify include/asm/types_32/64.h
  x86: unify include/asm/tlb_32/64.h
  x86: unify include/asm/siginfo_32/64.h
  x86: unify include/asm/bug_32/64.h
  x86: unify include/asm/mman_32/64.h
  x86: unify include/asm/agp_32/64.h
  x86: unify include/asm/kdebug_32/64.h
  x86: unify include/asm/ioctls_32/64.h
  x86: unify include/asm/floppy_32/64.h
  x86: apply missing DMA/OOM prevention to floppy_32.h
  x86: unify include/asm/cache_32/64.h
  x86: unify include/asm/cache_32/64.h
  x86: unify include/asm/dmi_32/64.h
  x86: unify include/asm/delay_32/64.h
  ...
2007-10-17 13:13:16 -07:00
Ingo Molnar
509a80c49c x86: fix CONFIG_PAGEALLOC related boot hangs/OOMs
if CONFIG_PAGEALLOC is enabled then X86_FEATURE_PSE is disabled and all
the kernel physical RAM pagetables are set up as 4K pages. This is
needed so that CONFIG_PAGEALLOC can do finegrained mapping and unmapping
of pages.

as a side-effect though, the total size of memory allocated as kernel
pagetables increases significantly. All these pagetables are allocated
via alloc_bootmem_low_pages(), straight out of the lowmem DMA pool. If
the system has enough RAM and a large kernel image then almost all of
the 16 MB lowmem DMA pool is allocated to the image and to pagetables -
leaving no space for __GFP_DMA allocations.

this results in drivers failing and the bootup hanging:

 swapper invoked oom-killer: gfp_mask=0x80d1, order=0, oomkilladj=0
  [<4015059f>] out_of_memory+0x17f/0x1c0
  [<40151f3c>] __alloc_pages+0x37c/0x3a0
  [<40168cd7>] slob_new_page+0x37/0x50
  [<40168dff>] slob_alloc+0x10f/0x190
  [<40169010>] __kmalloc_node+0x80/0x90
  [<405a17e3>] scsi_host_alloc+0x33/0x2c0
  [<405a1a82>] scsi_register+0x12/0x60
  [<40d5889e>] aha1542_detect+0x9e/0x940
  [<405c5ba5>] ultrastor_detect+0x265/0x5f0
  [<401352f5>] getnstimeofday+0x35/0xf0
  [<40d58751>] init_this_scsi_driver+0x41/0xf0
  [<40d0b856>] kernel_init+0x136/0x310
  [<40d58710>] init_this_scsi_driver+0x0/0xf0
  [<40d0b720>] kernel_init+0x0/0x310
  [<40105547>] kernel_thread_helper+0x7/0x10
  =======================

the fix is to first allocate from above the DMA pool, and if that fails
(for example due to it being a machine with less than 16 MB of RAM),
allocate from the DMA pool as a fallback.

With this fix applied i was able to boot a PAGEALLOC=y kernel that would
hang before.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-17 20:15:39 +02:00
Linus Torvalds
fb9fc39517 Merge branch 'xen-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'xen-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xfs: eagerly remove vmap mappings to avoid upsetting Xen
  xen: add some debug output for failed multicalls
  xen: fix incorrect vcpu_register_vcpu_info hypercall argument
  xen: ask the hypervisor how much space it needs reserved
  xen: lock pte pages while pinning/unpinning
  xen: deal with stale cr3 values when unpinning pagetables
  xen: add batch completion callbacks
  xen: yield to IPI target if necessary
  Clean up duplicate includes in arch/i386/xen/
  remove dead code in pgtable_cache_init
  paravirt: clean up lazy mode handling
  paravirt: refactor struct paravirt_ops into smaller pv_*_ops
2007-10-17 11:10:11 -07:00
Jeremy Fitzhardinge
4f81784774 remove dead code in pgtable_cache_init
The conversion from using a slab cache to quicklist left some residual
dead code.

I note that in the conversion it now always allocates a whole page for
the pgd, rather than the 32 bytes needed for a PAE pgd.  Was this
intended?

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
2007-10-16 11:51:29 -07:00
KAMEZAWA Hiroyuki
48e94196a5 fix memory hot remove not configured case.
Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
This patch cleans up them. This is against 2.6.23-rc6-mm1.

 - fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case.
 - For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
   which returns -EINVAL.
 - removed remove_pages() only used in powerpc.
 - removed no-op remove_memory() in i386, sh, sparc64, x86_64.

 - only powerpc returns -ENOSYS at memory hot remove(no-op). changes it
   to return -EINVAL.

Note:
Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
archs if there are requirements and testers.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:02 -07:00
Thomas Gleixner
ad757b6aa5 i386: move mm
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-11 11:16:47 +02:00