The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It has been decreed that platform numbers are evil, so as a step in that
direction, replace platform_is_lpar() with a FW_FEATURE_LPAR bit.
Currently FW_FEATURE_LPAR really means i/pSeries LPAR, in the future we might
have to clean that up if we need to be more specific about what LPAR actually
means. But that's another patch ...
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When iommu_init_early_pSeries() was added, ages ago, we forgot to remove
the code that checks /chosen/linux,iommu-off in pSeries_init_early(). We
do it now in iommu_init_early_pSeries().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The dynamic add path for PCI Host Bridges can fail to configure children
adapters under P5IOC controllers. It fails to properly fixup bus/device
resources, and it fails to properly enable EEH. Both of these steps
need to occur before any children devices are enabled in
pci_bus_add_devices().
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The lparcfg code needs several things which are pretty arcane internal
details and which we don't want to export, which means that lparcfg
doesn't work when built as a module. This makes it a bool instead of
a tristate in the Kconfig so that users can't try to build it as a
module.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Some hotplug driver functions were migrated to the kernel for use by EEH
in commit 2bf6a8fa21.
Previously, the PCI Hotplug module had been changed to use the new
OFDT-based PCI probe when appropriate:
5fa80fcdca
When rpaphp_pci_config_slot() was moved from the rpaphp driver to the
new kernel function pcibios_add_pci_devices(), the OFDT-based probe
stuff was dropped. This patch restores it.
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
HMT support is currently broken and needs to be reworked to play nicely
with the SMT scheduler. Remove the bit rotten bits for the time being.
I also updated an incorrect comment, we enter __secondary_hold with the
physical cpu id in r3.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If the logical and physical cpu ids of a secondary thread don't match, we will
fail to spin the thread up on pSeries machines due to a bug in pseries/smp.c
We call the RTAS "start-cpu" method with the physical cpu id, the address of
pSeries_secondary_smp_init and the value to pass that function in r3. Currently
we pass "lcpu", the logical cpu id, but pSeries_secondary_smp_init expects
the physical cpu id in r3.
We should be passing pcpu instead.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch removes all self references and fixes references to files
in the now defunct arch/ppc64 tree. I think this accomplises
everything wanted, though there might be a few references I missed.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently we have some stuff in firmware.h and kernel/firmware.c that is
#ifdef CONFIG_PPC_PSERIES. Move it all into platforms/pseries.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Clean up fw_feature_init in platforms/pseries/setup.c. Clean up white space
and replace the while loop with a for loop - which seems clearer to me.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We call unregister_vpa but we don't check to see if the hypervisor
supports this.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Anton Blanchard <anton@samba.org>
--
arch/powerpc/platforms/pseries/setup.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
Build break: Building PCI hotplug on PowerPC results in
a build break, due to failure to export symbols.
Reported today by Dave Jones <davej@redhat.com>:
drivers/pci/hotplug/rpaphp.ko needs unknown symbol pcibios_add_pci_devices
This patch fixes the break in the arch/powerpc tree.
Next patch fixes same problem in drivers/pci tree
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
At present the lppaca - the structure shared with the iSeries
hypervisor and phyp - is contained within the PACA, our own low-level
per-cpu structure. This doesn't have to be so, the patch below
removes it, making a separate array of lppaca structures.
This saves approximately 500*NR_CPUS bytes of image size and kernel
memory, because we don't need aligning gap between the Linux and
hypervisor portions of every PACA. On the other hand it means an
extra level of dereference in many accesses to the lppaca.
The patch also gets rid of several places where we assign the paca
address to a local variable for no particular reason.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add support to reconfigure the device tree through the existing
proc filesystem interface. Add "add_property", "remove_property",
and "update_property" commands to the existing interface.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
These symbols are only used in the file that they are defined in,
so they should not be in the global namespace.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove warning in eeh code about mixed variables and code.
Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a crash on null-pointer deref during dlpar slot addition.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 1c87c0f84943fbbc91826967ff4fea1b059a526f commit)
<asm/systemcfg.h> is gone now, and the PCI error recovery constants
in include/linux/pci.h changed their names in the process of getting
accepted.
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 5a2516156c591fc3d2059fbd93f97e15eb6010d6 commit)
242-eeh-no-percpu-counters.patch
Remove per-cpu counters from the EEH code. These statistics counters
are incremented at a very low frequency, and the performance gains of
per-cpu variables are negligable. By contrast, the counters weren't
safe against cpu off/online operations, and its not worth the effort
to make them so (other than to turn them into plain globals).
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from be3b5d1be053ccb41e91fa5a6f43ef5db301357d commit)
241-eeh-save-bars-earlier.patch
Save the PCI device bars *before* any PCI probing is done.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 76c902b919098860f3d4e125f847abcc4cb1782a commit)
239-eeh-multifunction-consolidate.patch
New-style firmware will often place multiple different functions
under a non-EEH-aware parent. However, these devices might share
a common PE "partition endpoint" and config address, ad thus any
EEH events will affect all of the devices in common. This patch
makes the effort to find all of these common devices and handle
them together.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 216810296bb97d39da8e176822e9de78d2f00187 commit)
238-eeh-stop-if-reset_failed.patch
If the firmware is unable to reset the PCI slot for some reason, then
don't attempt any further recovery steps after that point. Instead,
mark the device as permanently failed.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from e06b942521eb2cdaf232726f45a820d5837acb12 commit)
237-eeh-bridge-token.patch
Minor: the rtas-bridge token should be set up the same way that all
the other rtas tokens are set up.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 78379b6c5fc17b6666c40b05988e6708e98479c0 commit)
236-eeh-config-addr.patch
The PE configuration address wasn't being cnsistently used in all locations
where a config address is called for. This patch adds it to the places it
should have appeared in.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from c2bc904a28095aca0b04a37854b63b78622a032e commit)
235-eeh-set-pcidev-bugfix.patch
The pci device field of the pci_dn struct should be initialized to a
valid value.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from beb45c93d494a11c36e5b24f638e610db8428b54 commit)
234-eeh-find-pe.patch
The find_device_pe() routine is duplicated in two files. Remove one of
the two copies, declare the other extern.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 48408e708282d4d0269136ff27ea5acbd9410b5a commit)
26-eeh-partition-endpoint.patch
New versions of firmware introduce a new method by which the
"partitionable endpoint" (the point at which the pci bus is cut)
should be located. This code adds the support for this (mandatory)
new feature.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 9fcfb5d35b5294659f9299aa9cae6fd16325c07e commit)
25-pci-address-cache.patch
The core EEH file is rather large. This patch splits out a self-contained
chunk of it into its own file. This is the chunk that performes the
caching and lookup of pci devices based on the i/o addresses of thier
resoures. This code is almos architecture-independent and could be
used by any system that wanted to find a pci device based only on
the i/o address used by the device.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from b0b291d59906d4a9a89ed9e34d9fd684c7188924 commit)
Various PCI bus errors can be signaled by newer PCI controllers. The
core error recovery routines are architecture dependent. This patch adds
a recovery infrastructure for the PPC64 pSeries systems.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from e8ca11b460c4c9c7fa6b529be221529ebd770e38 commit)
This patch enables support for pause(0) power management state
for the Cell Broadband Processor, which is import for power efficient
operation. The pervasive infrastructure will in the future enable
us to introduce more functionality specific to the Cell's
pervasive unit.
From: Maximino Aguilar <maguilar@us.ibm.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, we are not looking at all interrupt controller nodes in the
device tree even though the proper node was not found. This is causing
the system panic. The attached patch will scan all nodes until it finds
the proper interrupt controller type.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There is code in the RPAPHP directory that is identical to this routine;
I'll be removing that code in an upcoming patch, but this patch is needed
to expose the function to make it callable.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Implementing the machine_crash_shutdown which will be called by
crash_kexec (called in case of a panic, sysrq etc.). Disable the
interrupts, shootdown cpus using debugger IPI and collect regs
for all CPUs.
elfcorehdr= specifies the location of elf core header stored by
the crashed kernel. This command line option will be passed by
the kexec-tools to capture kernel.
savemaxmem= specifies the actual memory size that the first kernel
has and this value will be used for dumping in the capture kernel.
This command line option will be passed by the kexec-tools to
capture kernel.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The fwnmi vectors can be anywhere < 32 MB, so we need to use a trampoline
for them. The kdump kernel will register the trampoline addresses, which will
then jump up to the real code above 32 MB.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Minor: use macro to perform void pointer deref; this may someday help
avoid pointer typecasting errors.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The udbg low level io layer has an issue with udbg_getc() returning a
char (unsigned on ppc) instead of an int, thus the -1 if you had no
available input device could end up turned into 0xff, filling your
display with bogus characters. This fixes it, along with adding a little
blob to xmon to do a delay before exiting when getting an EOF and fixing
the detection of ADB keyboards in udbg_adb.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
23-rpaphp-migrate.patch (parts)
This patch moves some pci device add & remove code from the PCI
hotplug directory to the arch/powerpc/kernel directory, and cleans
it up a tad. The primary reason for this is that the code performs
some fairly generic operations that are shared with the PCI error
recovery code (living in the arch/powerpc/kernel directory).
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
20-rpaphp-eeh-cleanup.patch
This patch move some code from the rpaphp directory, to the powerpc
directory, where it should have been all along (Among other things, I
need it in the powerpc directory for the PCI error recovery.)
Please note that patch affects TWO maintainers: Paul, after applying
the powerpc part, please ask that GregKH appli the PCI part. It is safe
to have the powerpc part go in first. It would be bad to have the
PCI part go in first.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch unifies udbg for both ppc32 and ppc64 when building the
merged achitecture. xmon now has a single "back end". The powermac udbg
stuff gets enriched with some ADB capabilities and btext output. In
addition, the early_init callback is now called on ppc32 as well,
approx. in the same order as ppc64 regarding device-tree manipulations.
The init sequences of ppc32 and ppc64 are getting closer, I'll unify
them in a later patch.
For now, you can force udbg to the scc using "sccdbg" or to btext using
"btextdbg" on powermacs. I'll implement a cleaner way of forcing udbg
output to something else than the autodetected OF output device in a
later patch.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the discovery of legacy serial ports to a separate file,
makes it common to ppc32 and ppc64, and reworks it to use the new OF
address translators to get to the ports early. This new version can also
detect some PCI serial cards using legacy chips and will probably match
those discovered port with the default console choice.
Only ppc64 gets udbg still yet, unifying udbg isn't finished yet.
It also adds some speed-probing code to udbg so that the default console
can come up at the same speed it was set to by the firmware.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>