This rewrites pretty much from scratch the handling of MMIO and PIO
space allocations on powerpc64. The main goals are:
- Get rid of imalloc and use more common code where possible
- Simplify the current mess so that PIO space is allocated and
mapped in a single place for PCI bridges
- Handle allocation constraints of PIO for all bridges including
hot plugged ones within the 2GB space reserved for IO ports,
so that devices on hotplugged busses will now work with drivers
that assume IO ports fit in an int.
- Cleanup and separate tracking of the ISA space in the reserved
low 64K of IO space. No ISA -> Nothing mapped there.
I booted a cell blade with IDE on PIO and MMIO and a dual G5 so
far, that's it :-)
With this patch, all allocations are done using the code in
mm/vmalloc.c, though we use the low level __get_vm_area with
explicit start/stop constraints in order to manage separate
areas for vmalloc/vmap, ioremap, and PCI IOs.
This greatly simplifies a lot of things, as you can see in the
diffstat of that patch :-)
A new pair of functions pcibios_map/unmap_io_space() now replace
all of the previous code that used to manipulate PCI IOs space.
The allocation is done at mapping time, which is now called from
scan_phb's, just before the devices are probed (instead of after,
which is by itself a bug fix). The only other caller is the PCI
hotplug code for hot adding PCI-PCI bridges (slots).
imalloc is gone, as is the "sub-allocation" thing, but I do beleive
that hotplug should still work in the sense that the space allocation
is always done by the PHB, but if you unmap a child bus of this PHB
(which seems to be possible), then the code should properly tear
down all the HPTE mappings for that area of the PHB allocated IO space.
I now always reserve the first 64K of IO space for the bridge with
the ISA bus on it. I have moved the code for tracking ISA in a separate
file which should also make it smarter if we ever are capable of
hot unplugging or re-plugging an ISA bridge.
This should have a side effect on platforms like powermac where VGA IOs
will no longer work. This is done on purpose though as they would have
worked semi-randomly before. The idea at this point is to isolate drivers
that might need to access those and fix them by providing a proper
function to obtain an offset to the legacy IOs of a given bus.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Track and report the number of times we read an all-1s value (0xff,
0xffff or 0xffffffff) from each device which is valid data, not
indicating EEH isolation.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
arch/powerpc/platforms/pseries/eeh.c | 5 +++++
arch/powerpc/platforms/pseries/eeh_sysfs.c | 3 +++
include/asm-powerpc/pci-bridge.h | 1 +
3 files changed, 9 insertions(+)
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use the correct CONFIG_ option to mark off the EEH bits.
Move the EEH bits to the bottom of the struct.
The config_space array is used by EEH only; it does not
need to be part of the struct for non-pseries machines.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
Revised patch, per commments from Michael Ellerman.
include/asm-powerpc/pci-bridge.h | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
Reserve two TIF flags for perfmon2 and shift them into the low 16 bits
so we can use single assembly instructions to create constants based off
them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
pte_alloc_one() is expected to return NULL if out of memory.
But it returns virt_to_page(NULL), which is not NULL.
This fixes it.
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Akinobu Mita <mita@fixstars.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
I think we have a subtle race on ppc64 with the tlb batching. The
common code expects tlb_flush() to actually flush any pending TLB
batch. It does that because it delays all page freeing until after
tlb_flush() is called, in order to ensure no stale reference to
those pages exist in any TLB, thus causing potential access to
the freed pages.
However, our tlb_flush only triggers the RCU for freeing page
table pages, it does not currently trigger a flush of a pending
TLB/hash batch, which is, I think, an error. This fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.
This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
getting them indirectly
Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
they don't need sched.h
b) sched.h stops being dependency for significant number of files:
on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
after patch it's only 3744 (-8.3%).
Cross-compile tested on
all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
alpha alpha-up
arm
i386 i386-up i386-defconfig i386-allnoconfig
ia64 ia64-up
m68k
mips
parisc parisc-up
powerpc powerpc-up
s390 s390-up
sparc sparc-up
sparc64 sparc64-up
um-x86_64
x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
as well as my two usual configs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are no actual implementations of fixup_bigphys_addr() in
arch/powerpc, and with a 64-bit aware ioremap() and so forth, it
should no longer be necessary. This patch removes the last dregs of
fixup_bigphys_addr() from arch/powerpc.
In fact, the only reason this hasn't caused link errors already is
that nobody must have tried using one of the small number of drivers
using io_remap_pfn_range() on one of the small number of platforms
which are 32-bit but define CONFIG_PHYS_64BIT. Nonetheless this fixes
a bug, and should go into 2.6.22.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove CPU_FTR_NEED_COHERENT for MPC7448 (and single-core MPC86xx).
This prevents needlessly setting M=1 when not SMP.
Signed-off-by: James.Yang <James.Yang@freescale.com>
Acked-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Change several headers in include/asm-powerpc that currently use some variation
of ASM_PPC to use ASM_POWERPC instead.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch renames the raw hard_irq_{enable,disable} into
__hard_irq_{enable,disable} and introduces a higher level hard_irq_disable()
function that can be used by any code to enforce that IRQs are fully disabled,
not only lazy disabled.
The difference with the __ versions is that it will update some per-processor
fields so that the kernel keeps track and properly re-enables them in the next
local_irq_disable();
This prepares powerpc for my next patch that introduces hard_irq_disable()
generically.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These files are almost all the same.
This patch could be made even simpler if we don't mind POLLREMOVE turning
up in a few architectures that didn't have it previously (which should be
OK as POLLREMOVE is not used anywhere in the current tree).
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds the necessary support to hpte_decode() to handle 1TB
segments and 16GB pages, and removes an uninitialized value
warning on avpn.
We don't have any code to generate HPTEs for 1TB segments or 16GB
pages yet, so this is mostly for completeness, and to fix the
warning.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix a PS3 build error when CONFIG_PS3_SYS_MANAGER=n.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The rheap allocation functions return a pointer, but the actual value is based
on how the heap was initialized, and so it can be anything, e.g. an offset
into a buffer. A ulong is a better representation of the value returned by
the allocation functions.
This patch changes all of the relevant rheap functions to use a unsigned long
integers instead of a pointer. In case of an error, the value returned is
a negative error code that has been cast to an unsigned long. The caller can
use the IS_ERR_VALUE() macro to check for this.
All code which calls the rheap functions is updated accordingly. Macros
IS_MURAM_ERR() and IS_DPERR(), have been deleted in favor of IS_ERR_VALUE().
Also added error checking to rh_attach_region().
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch moves a copy of reg_booke.h to include/asm-powerpc and fixes
up the ifdef protection.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Further fixes for the removal of 4level-fixup hack from ppc32
[POWERPC] EEH: log all PCI-X and PCI-E AER registers
[POWERPC] EEH: capture and log pci state on error
[POWERPC] EEH: Split up long error msg
[POWERPC] EEH: log error only after driver notification.
[POWERPC] fsl_soc: Make mac_addr const in fs_enet_of_init().
[POWERPC] Don't use SLAB/SLUB for PTE pages
[POWERPC] Spufs support for 64K LS mappings on 4K kernels
[POWERPC] Add ability to 4K kernel to hash in 64K pages
[POWERPC] Introduce address space "slices"
[POWERPC] Small fixes & cleanups in segment page size demotion
[POWERPC] iSeries: Make HVC_ISERIES the default
[POWERPC] iSeries: suppress build warning in lparmap.c
[POWERPC] Mark pages that don't exist as nosave
[POWERPC] swsusp: Introduce register_nosave_region_late
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
sound: convert "sound" subdirectory to UTF-8
MAINTAINERS: Add cxacru website/mailing list
include files: convert "include" subdirectory to UTF-8
general: convert "kernel" subdirectory to UTF-8
documentation: convert the Documentation directory to UTF-8
Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
remove broken URLs from net drivers' output
Magic number prefix consistency change to Documentation/magic-number.txt
trivial: s/i_sem /i_mutex/
fix file specification in comments
drivers/base/platform.c: fix small typo in doc
misc doc and kconfig typos
Remove obsolete fat_cvf help text
Fix occurrences of "the the "
Fix minor typoes in kernel/module.c
Kconfig: Remove reference to external mqueue library
Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
Correct comments in genrtc.c to refer to correct /proc file.
Fix more "deprecated" spellos.
Fix "deprecated" typoes.
...
Fix trivial comment conflict in kernel/relay.c.
With the advent of kdump, the assumption that the boot CPU when booting an UP
kernel is always the CPU with a particular hardware ID (often 0) (usually
referred to as BSP on some architectures) is not valid anymore. The reason
being that the dump capture kernel boots on the crashed CPU (the CPU that
invoked crash_kexec), which may be or may not be that particular CPU.
Move definition of hard_smp_processor_id for the UP case to
architecture-specific code ("asm/smp.h") where it belongs, so that each
architecture can provide its own implementation.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The SLUB allocator relies on struct page fields first_page and slab,
overwritten by ptl when SPLIT_PTLOCK: so the SLUB allocator cannot then
be used for the lowest level of pagetable pages. This was obstructing
SLUB on PowerPC, which uses kmem_caches for its pagetables. So convert
its pte level to use normal gfp pages (whereas pmd, pud and 64k-page pgd
want partpages, so continue to use kmem_caches for pmd, pud and pgd).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds an option to spufs when the kernel is configured for
4K page to give it the ability to use 64K pages for SPE local store
mappings.
Currently, we are optimistic and try order 4 allocations when creating
contexts. If that fails, the code will fallback to 4K automatically.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds the ability for a kernel compiled with 4K page size
to have special slices containing 64K pages and hash the right type
of hash PTEs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The basic issue is to be able to do what hugetlbfs does but with
different page sizes for some other special filesystems; more
specifically, my need is:
- Huge pages
- SPE local store mappings using 64K pages on a 4K base page size
kernel on Cell
- Some special 4K segments in 64K-page kernels for mapping a dodgy
type of powerpc-specific infiniband hardware that requires 4K MMU
mappings for various reasons I won't explain here.
The main issues are:
- To maintain/keep track of the page size per "segment" (as we can
only have one page size per segment on powerpc, which are 256MB
divisions of the address space).
- To make sure special mappings stay within their allotted
"segments" (including MAP_FIXED crap)
- To make sure everybody else doesn't mmap/brk/grow_stack into a
"segment" that is used for a special mapping
Some of the necessary mechanisms to handle that were present in the
hugetlbfs code, but mostly in ways not suitable for anything else.
The patch relies on some changes to the generic get_unmapped_area()
that just got merged. It still hijacks hugetlb callbacks here or
there as the generic code hasn't been entirely cleaned up yet but
that shouldn't be a problem.
So what is a slice ? Well, I re-used the mechanism used formerly by our
hugetlbfs implementation which divides the address space in
"meta-segments" which I called "slices". The division is done using
256MB slices below 4G, and 1T slices above. Thus the address space is
divided currently into 16 "low" slices and 16 "high" slices. (Special
case: high slice 0 is the area between 4G and 1T).
Doing so simplifies significantly the tracking of segments and avoids
having to keep track of all the 256MB segments in the address space.
While I used the "concepts" of hugetlbfs, I mostly re-implemented
everything in a more generic way and "ported" hugetlbfs to it.
Slices can have an associated page size, which is encoded in the mmu
context and used by the SLB miss handler to set the segment sizes. The
hash code currently doesn't care, it has a specific check for hugepages,
though I might add a mechanism to provide per-slice hash mapping
functions in the future.
The slice code provide a pair of "generic" get_unmapped_area() (bottomup
and topdown) functions that should work with any slice size. There is
some trickiness here so I would appreciate people to have a look at the
implementation of these and let me know if I got something wrong.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
tas() has no users, so get rid of it.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
atomic_add_unless as inline. Remove system.h atomic.h circular dependency.
I agree (with Andi Kleen) this typeof is not needed and more error
prone. All the original atomic.h code that uses cmpxchg (which includes
the atomic_add_unless) uses defines instead of inline functions,
probably to circumvent a circular dependency between system.h and
atomic.h on powerpc (which my patch addresses). Therefore, it makes
sense to use inline functions that will provide type checking.
atomic_add_unless as inline. Remove system.h atomic.h circular dependency.
Digging into the FRV architecture shows me that it is also affected by
such a circular dependency. Here is the diff applying this against the
rest of my atomic.h patches.
It applies over the atomic.h standardization patches.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the size of the per-cpu region reserved to save crash notes is
set by the per-architecture value MAX_NOTE_BYTES. Which in turn is
currently set to 1024 on all supported architectures.
While testing ia64 I recently discovered that this value is in fact too
small. The particular setup I was using actually needs 1172 bytes. This
lead to very tedious failure mode where the tail of one elf note would
overwrite the head of another if they ended up being alocated sequentially
by kmalloc, which was often the case.
It seems to me that a far better approach is to caclculate the size that
the area needs to be. This patch does just that.
If a simpler stop-gap patch for ia64 to be squeezed into 2.6.21(.X) is
needed then this should be as easy as making MAX_NOTE_BYTES larger in
arch/asm-ia64/kexec.h. Perhaps 2048 would be a good choice. However, I
think that the approach in this patch is a much more robust idea.
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently a parport_driver can't get a handle on the device node for the
underlying parport (PNPACPI, PCI, etc). That prevents correct placement of
sysfs child nodes, which can affect things like power management.
This patch adds a field to "struct parport" pointing to that device node, and
updates non-legacy port drivers to initialize that device pointer. That field
replaces the analagous PCI-only support in parport_pc.
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)
arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds support for early serial debugging via the built in
port on IBM/AMCC PowerPC 44x CPUs. It uses a bolted TLB entry in
address space 1 for the UART's mapping, allowing robust debugging both
before and after the initialization of the MMU.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
To support MSI on MPIC we need a way to reserve and allocate hardware irq
numbers, this patch implements an allocator for that purpose.
New firmware platforms must define a "msi-available-ranges" property on their
MPIC node for MSI to work. For U3/U4 we do a best-guess setup.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This provides the architecture specific hooks to support MSI on
powerpc. We implement the newly added arch_setup_msi_irqs() and
arch_teardown_msi_irqs(), and then delegate to ppc_md routines.
Platforms that don't implement MSI will leave the ppc_md calls blank,
arch_msi_check_device() will detect this and return ENOSYS. Drivers
should detect this error and continue to use LSI.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Rip out the existing powerpc msi stubs. These were the start of an
implementation based on ppc_md calls, but were never used in mainline.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For 32-bit systems, powerpc still relies on the 4level-fixup.h hack,
to pretend that the generic pagetable handling stuff is 3-levels
rather than 4. This patch removes this, instead using the newer
pgtable-nopmd.h to handle the elision of both the pud and pmd
pagetable levels (ppc32 pagetables are actually 2 levels).
This removes a little extraneous code, and makes it more easily
compared to the 64-bit pagetable code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Generalize tsi108_setup_pci to take the config space physical address and
primary bus designator as a parameter.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a phy_type field to the tsi108 ethernet structures to indicate which PHY
is used on a board. This is derived from the "compatible" property in the
ethernet-phy node of the device tree. The default remains the MV88E PHY.
Also, convert the setup code to use of_get_mac_address instead of hard coding
a lookup for the "address" property in the ethernet node.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a header file for the common PCI routines used for the TSI bridge
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Implement deep-sleep on MPC52xx.
SDRAM is put into self-refresh with help of SRAM code
(alternatives would be code in FLASH, I-cache).
Interrupt code must also not be in SDRAM, so put it
in I-cache.
MPC52xx core is static, so contents will remain intact even
with clocks turned off.
Signed-off-by: Domen Puncer <domen.puncer@telargo.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Apparently other parts of the kernel need to know the
modalias internally (like the sysfs code in macintosh driver).
To avoid consistency issues, we export this code and use it
everywhere it's needed rather than repeat it ...
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>