Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in
in the hash code based on some CPU feature bit. We also manipulate
_PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places.
This changes the logic so that instead, the PTE now contains
_PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms
that need it. The hash code clears it if the feature bit is not set.
It also adds some clean accessors to setup various valid combinations
of access flags and change various bits of code to use them instead.
This should help having the PTE actually containing the bit
combinations that we really want.
I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead
set it explicitely from the TLB miss. I will ultimately remove it
completely as it appears that it might not be needed after all
but in the meantime, having it in the TLB miss makes things a
lot easier.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes the MMU context code used for CPUs with no hash table
(except 603) dynamically allocate the various maps used to track
the state of contexts.
Only the main free map and CPU 0 stale map are allocated at boot
time. Other CPU maps are allocated when those CPUs are brought up
and freed if they are unplugged.
This also moves the initialization of the MMU context management
slightly later during the boot process, which should be fine as
it's really only needed when userland if first started anyways.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, the various forms of low level TLB invalidations are all
implemented in misc_32.S for 32-bit processors, in a fairly scary
mess of #ifdef's and with interesting duplication such as a whole
bunch of code for FSL _tlbie and _tlbia which are no longer used.
This moves things around such that _tlbie is now defined in
hash_low_32.S and is only used by the 32-bit hash code, and all
nohash CPUs use the various _tlbil_* forms that are now moved to
a new file, tlb_nohash_low.S.
I moved all the definitions for that stuff out of
include/asm/tlbflush.h as they are really internal mm stuff, into
mm/mmu_decl.h
The code should have no functional changes. I kept some variants
inline for trivial forms on things like 40x and 8xx.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This commit moves the whole no-hash TLB handling out of line into a
new tlb_nohash.c file, and implements some basic SMP support using
IPIs and/or broadcast tlbivax instructions.
Note that I'm using local invalidations for D->I cache coherency.
At worst, if another processor is trying to execute the same and
has the old entry in its TLB, it will just take a fault and re-do
the TLB flush locally (it won't re-do the cache flush in any case).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We're soon running out of CPU features and I need to add some new
ones for various MMU related bits, so this patch separates the MMU
features from the CPU features. I moved over the 32-bit MMU related
ones, added base features for MMU type families, but didn't move
over any 64-bit only feature yet.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This splits the mmu_context handling between 32-bit hash based
processors, 64-bit hash based processors and everybody else. This is
preliminary work for adding SMP support for BookE processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds supports to the "extended" DCR addressing via the indirect
mfdcrx/mtdcrx instructions supported by some 4xx cores (440H6 and
later).
I enabled the feature for now only on AMCC 460 chips.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Using the common code means that more complete cache information will
provided in sysfs on platforms that don't use the l2-cache property
convention.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The smp code uses cache information to populate cpu_core_map; change
it to use common code for cache lookup.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We have more than one piece of code that looks up cache nodes manually
using the "l2-cache" property. Add a common helper routine which does
this and handles ePAPR's "next-level-cache" property as well as
powermac.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since "Factor out cpu joining/unjoining the GIQ"
(b4963255ad) the WARN_ON in
xics_set_cpu_giq() is being triggered during boot on JS20 because the
GIQ indicator is not available on that platform. While the warning is
harmless and the system runs normally, it's nicer to check for the
existence of the indicator before trying to manipulate it.
Implement rtas_indicator_present(), which searches the
/rtas/rtas-indicators property for the given indicator token, and use
this function in xics_set_cpu_giq().
Also use a WARN statement in xics_set_cpu_giq to get better
information on failure.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
smp_hw_index isn't used on 64-bit, so move it from smp.c to
setup_32.c.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The `have_of' variable is a relic from the arch/ppc time, it isn't
useful nowadays.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
An example calling sequence which we did see:
copy_user_highpage -> kmap_atomic -> flush_tlb_page -> _tlbil_va
We got interrupted after setting up the MAS registers before the
tlbwe and the interrupt handler that caused the interrupt also did
a kmap_atomic (ide code) and thus on returning from the interrupt
the MAS registers no longer contained the proper values.
Since we dont save/restore MAS registers for normal interrupts we
need to disable interrupts in _tlbil_va to ensure atomicity.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The 440x5 core in the Virtex5 uses the 440A type machine check
(ie, they have MCSRR0/MCSRR1). They thus need to call the
appropriate fixup function to hook the right variant of the
exception.
Without this, all machine checks become fatal due to loss
of context when entering the exception handler.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Added 85xx specifc smp_ops structure. We use ePAPR style boot release
and the MPIC for IPIs at this point.
Additionally added routines for secondary cpu entry and initializtion.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The initial TLB mapping for the kernel boot didn't set the memory coherent
attribute, MAS2[M], in SMP mode.
If this code supported booting a secondary processor, which it doesn't yet,
but if it did, then when a secondary processor boots, it would probably signal
the primary processor by setting a variable called something like
__secondary_hold_acknowledge. However, due to the lack of the M bit, the
primary processor would not snoop the transaction (even if a transaction were
broadcast). If primary CPU's L1 D-cache had a copy, it would not be flushed
and the CPU would never see the ack. Which would have resulted in the primary
CPU spinning for a long time, perhaps a full second before it gives up, while
it would have waited for the ack from the secondary CPU that it wouldn't have
been able to see because of the stale cache.
The value of MAS2 for the boot page TLB1 entry is a compile time constant,
so there is no need to calculate it in powerpc assembly language.
Also, from the MPC8572 manual section 6.12.5.3, "Bits that represent
offsets within a page are ignored and should be cleared." Existing code
didn't clear them, this code does.
The same when the page of KERNELBASE is found; we don't need to use asm to
mask the lower 12 bits off.
In the code that computes the address to rfi from, don't hard code the
offset to 24 bytes, but have the assembler figure that out for us.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch add the handlers of SPE/EFP exceptions.
The code is used to emulate float point arithmetic,
when MSR(SPE) is enabled and receive EFP data interrupt or EFP round interrupt.
This patch has no conflict with or dependence on FP math-emu.
The code has been tested by TestFloat.
Now the code doesn't support SPE/EFP instructions emulation
(it won't be called when receive program interrupt),
but it could be easily added.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
ibmebus_free_irq() frees the IRQ but does not remove its mapping, which
results in stale entries in the map.
This fixes it by adding a call to irq_dispose_mapping() in
ibmebus_free_irq().
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As noted by Akinobu Mita in commit b1fceac2 ("x86: remove unnecessary
memset and NULL check after alloc_bootmem()"), alloc_bootmem and
related functions never return NULL and always return a zeroed region
of memory. Thus a NULL test or memset after calls to these functions
is unnecessary.
This was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression E;
statement S;
@@
E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\|alloc_bootmem_node\|alloc_bootmem_low_pages_node\|alloc_bootmem_pages_node\)(...)
... when != E
(
- BUG_ON (E == NULL);
|
- if (E == NULL) S
)
@@
expression E,E1;
@@
E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\|alloc_bootmem_node\|alloc_bootmem_low_pages_node\|alloc_bootmem_pages_node\)(...)
... when != E
- memset(E,0,E1);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We need to swap these out once we start using swiotlb, so add
them to dma_ops. Create CONFIG_PPC_NEED_DMA_SYNC_OPS Kconfig
option; this is currently enabled automatically if we're
CONFIG_NOT_COHERENT_CACHE. In the future, this will also
be enabled for builds that need swiotlb. If PPC_NEED_DMA_SYNC_OPS
is not defined, the dma_sync_*_for_* ops compile to nothing.
Otherwise, they access the dma_ops pointers for the sync ops.
This patch also changes dma_sync_single_range_* to actually
sync the range - previously it was using a generous
dma_sync_single. dma_sync_single_* is now implemented
as a dma_sync_single_range with an offset of 0.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On my screen, when something crashes, I only have space for maybe 16
functions of the stack trace before the information above it scrolls
off the screen. It's easy to hack the kernel to print out only that
much, but it's harder to remember to do it. This introduces a config
option for it so that I can keep the setting in my config.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On PowerPC 4xx or other non cache-coherent platforms, we lost the
appropriate cache flushing in dma_map_sg() when merging the 32 and
64-bit DMA code (commit 4fc665b88a,
"powerpc: Merge 32 and 64-bit dma code"). This restores it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
attr_smt_snooze_delay is only defined for CONFIG_PPC64, so protect the
attribute removal with the same condition. This fixes this build error
on 32-bit SMP configurations:
/data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c: In function ‘unregister_cpu_online’:
/data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c:722: error: ‘attr_smt_snooze_delay’ undeclared (first use in this function)
/data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c:722: error: (Each undeclared identifier is reported only once
/data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c:722: error: for each function it appears in.)
Signed-off-by: Paul Mackerras <paulus@samba.org>
It turns out that on Cell, on a kernel with CONFIG_VIRT_CPU_ACCOUNTING
= y, if a program sets the SO (summary overflow) bit in the XER and
then does a system call, the SO bit in CR0 will be set on return
regardless of whether the system call detected an error. Since CR0.SO
is used as the error indication from the system call, this means that
all system calls appear to fail.
The reason is that the workaround for the timebase bug on Cell uses a
compare instruction. With CONFIG_VIRT_CPU_ACCOUNTING = y, the
ACCOUNT_CPU_USER_ENTRY macro reads the timebase, so we end up doing a
compare instruction, which copies XER.SO to CR0.SO. Since we were
doing this in the system call entry patch after clearing CR0.SO but
before saving the CR, this meant that the saved CR image had CR0.SO
set if XER.SO was set on entry.
This fixes it by moving the clearing of CR0.SO to after the
ACCOUNT_CPU_USER_ENTRY call in the system call entry path.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, some PCIe devices on POWER6 machines do not get interrupts
assigned correctly. The problem is that OF doesn't create an
"interrupt" property for them. The fix is for of_irq_map_pci to fall
back to using the value in the PCI interrupt-pin register in config
space, as we do when there is no OF device-tree node for the device.
I have verified that this works fine with a pair of Squib-E SAS
adapter on a P6-570.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
With the new generic smp call function helpers, I noticed the code in
smp_message_recv was a single function call in many cases. While
getting the message number from the ipi data is easy, we can reduce
the path length by a function and data-dependent switch by registering
seperate IPI actions for these simple calls.
Originally I left the ipi action array exposed, but then I realized the
registration code should be common too.
The three users each had their own name array, so I made a fourth
to convert all users to use a common one.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Those cores use the 440A type machine check (ie, they have
MCSRR0/MCSRR1). They thus need to call the appropriate fixup
function to hook the right variant of the exception.
Without this, all machine checks become fatal due to loss
of context when entering the exception handler.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The new context may not be 16-byte aligned, so the real address of the
mcontext structure should be read from the uc_regs pointer instead of
directly using the (unaligned) uc_mcontext field.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since we started using the generic timekeeping code, we haven't had a
powerpc-specific version of do_gettimeofday, and hence there is now
nothing that reads the do_gtod variable in arch/powerpc/kernel/time.c.
This therefore removes it and the code that sets it.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently the clock_gettime implementation in the VDSO produces a
result with microsecond resolution for the cases that are handled
without a system call, i.e. CLOCK_REALTIME and CLOCK_MONOTONIC. The
nanoseconds field of the result is obtained by computing a
microseconds value and multiplying by 1000.
This changes the code in the VDSO to do the computation for
clock_gettime with nanosecond resolution. That means that the
resolution of the result will ultimately depend on the timebase
frequency.
Because the timestamp in the VDSO datapage (stamp_xsec, the real time
corresponding to the timebase count in tb_orig_stamp) is in units of
2^-20 seconds, it doesn't have sufficient resolution for computing a
result with nanosecond resolution. Therefore this adds a copy of
xtime to the VDSO datapage and updates it in update_gtod() along with
the other time-related fields.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This does a few cosmetic cleanups, moving a couple of things around
but without actually changing what the code does.
(There is a minor change in ordering of operations in
pcibios_setup_bus_devices but it should have no impact).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The pseries PCI hotplug code has a number of issues, ranging from
incorrect resource setup to crashes, depending on what is added,
when, whether it contains a bridge, etc etc....
This fixes a whole bunch of these, while actually simplifying the code
a bit, using more generic code in the process and factoring out common
code between adding of a PHB, a slot or a device.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
To properly fix PCI hotplug, it's useful to be able to make the fixup
passes on all devices whether they were just hot plugged or already
there.
However, pcibios_allocate_bus_resources() wouldn't cope well with
being called twice for a given bus. This makes it ignore resources
that have already been allocated, along with adding a bit of debug
output.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, our PCI code uses the pcibios_fixup_bus() callback, which
is called by the generic code when probing PCI buses, for two
different things.
One is to set up things related to the bus itself, such as reading
bridge resources for P2P bridges, fixing them up, or setting up the
iommu's associated with bridges on some platforms.
The other is some setup for each individual device under that bridge,
mostly setting up DMA mappings and interrupts.
The problem is that this approach doesn't work well with PCI hotplug
when an existing bus is re-probed for new children. We fix this
problem by splitting pcibios_fixup_bus into two routines:
pcibios_setup_bus_self() is now called to setup the bus itself
pcibios_setup_bus_devices() is now called to setup devices
pcibios_fixup_bus() is then modified to call these two after reading the
bridge bases, and the OF based PCI probe is modified to avoid calling
into the first one when rescanning an existing bridge.
[paulus@samba.org - fixed eeh.h for 32-bit compile now that pci-common.c
is including it unconditionally.]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The function pcibios_do_bus_setup() was used by pcibios_fixup_bus()
to perform setup that is different between the 32-bit and 64-bit
code. This difference no longer exists, thus the function is removed
and the setup now done directly from pci-common.c.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 32-bit and 64-bit powerpc PCI code used to set up the resource
pointers of the root bus of a given PHB in completely different
places.
This unifies this in large part, by making 32-bit use a routine very
similar to what 64-bit does when initially scanning the PCI busses.
The actual setup of the PHB resources itself is then moved to a
common function in pci-common.c.
This should cause no functional change on 64-bit. On 32-bit, the
effect is that the PHB resources are going to be setup a bit earlier,
instead of being setup from pcibios_fixup_bus().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This removes the various DBG() macro from the powerpc PCI code and
makes it use the standard pr_debug instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A new field has been added to the VPA as a method for the client OS to
communicate to firmware the number of page-ins it is performing when
running collaborative memory overcommit. The hypervisor will use this
information to better determine if a partition is experiencing memory
pressure and needs more memory allocated to it.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When no hardware method is provided to sync the timebase registers
across the machine, and the platform doesn't sync them for us, then we
use a generic software implementation. Currently, the code for that
has many printks, and they don't have log levels. Most of the printks
are only useful for debugging the code, and since we haven't had any
problems with it for years, this turns them into pr_debug.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>