Various SMP balancing algorithms require that the bandwidth period
run in sync.
Possible improvements are moving the rt_bandwidth thing into root_domain
and keeping a span per rt_bandwidth which marks throttled cpus.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes smpboot.c so that it can start slave cpus running
in UV non-unique apicid mode. The SIPI must be sent using a UV-specific
mechanism.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They are placed in an ifdef, since they are i386 specific
the structure definition goes to dma-mapping.h.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
we merge the iommu initialization parameters in pci-dma.c
Nice thing, that both architectures at least recognize the same
parameters.
usedac i386 parameter is marked for deprecation
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
via_no_dac provides a fixup that is the same for both
architectures. Move it to pci-dma.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is done to get the code closer to x86_64.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
all the code that is left is ready to be merged as-is
in dma-mapping.h.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
define it conditionally to i386.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We provide a map_error function in pci-base_32.c to make
sure i386 keeps with the same behaviour it used to.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It's initially 0, since we don't expect any DMA there.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Do it instead of using the conservative approach we're currently
doing. This is the way x86_64 does, and this patch makes this piece
of code the same between them, ready to be integrated.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is the way x86_64 does, so this make them equal. They have
to be extern now in the header, and the extern definition is moved to
the common dma-mapping.h header.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
they are the same in both architectures.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
They are similar enough to do this move.
the macro version is ugly, and we use inline functions instead.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
they are the same between architectures. (except for the fact
that x86_64 has duplicate code)
move them to dma-mapping.h
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
the old i386 implementation is moved to pci-base_32.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
i386 base does not need it, so it gets an empty function.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
That's already the name of the game for x86_64. For i386,
we add a pci-base_32.c, that will hold the default operations.
The function call itself goes through dma-mapping.h , the common
header
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
take it off the x86_64 specific header
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
a system with 256 GB of RAM, when NUMA is disabled crashes the
following way:
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Cannot allocate aperture memory hole (ffff8101c0000000,65536K)
Kernel panic - not syncing: Not enough memory for aperture
Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33
Call Trace:
[<ffffffff84037c62>] panic+0xb2/0x190
[<ffffffff840381fc>] ? release_console_sem+0x7c/0x250
[<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90
[<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50
[<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680
[<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310
[<ffffffff84506a2f>] ? _etext+0x0/0x1
[<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40
[<ffffffff847ac795>] mem_init+0x45/0x1a0
[<ffffffff8479ff35>] start_kernel+0x295/0x380
[<ffffffff8479f1c2>] _sinittext+0x1c2/0x230
the root cause is : memmap PMD is too big,
[ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0
almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G.
solution will be:
1. make memmap allocation get memory above 4G...
2. reserve some dma32 range early before we try to set up memmap for all.
and release that before pci_iommu_alloc, so gart or swiotlb could get some
range under 4g limit for sure.
the patch is using method 2.
because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP
will get
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Memory: 264245736k/268959744k available (8484k kernel code, 4187464k reserved, 4004k data, 724k init)
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For example, If the physical address layout on a two node system with 8 GB
memory is something like:
node 0: 0-2GB, 4-6GB
node 1: 2-4GB, 6-8GB
Current kernels fail to boot/detect this NUMA topology.
ACPI SRAT tables can expose such a topology which needs to be supported.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Only allocate the FPU area when the application actually uses FPU, i.e., in the
first lazy FPU trap. This could save memory for non-fpu using apps.
for example: on my system after boot, there are around 300 processes, with
only 17 using FPU.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Split the FPU save area from the task struct. This allows easy migration
of FPU context, and it's generally cleaner. It also allows the following
two optimizations:
1) only allocate when the application actually uses FPU, so in the first
lazy FPU trap. This could save memory for non-fpu using apps. Next patch
does this lazy allocation.
2) allocate the right size for the actual cpu rather than 512 bytes always.
Patches enabling xsave/xrstor support (coming shortly) will take advantage
of this.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
this function doesnt just 'find' the max_pfn - it also has
other side-effects such as registering sparse memory maps.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch implements the PR_GET_TSC and PR_SET_TSC prctl()
commands on the x86 platform (both 32 and 64 bit.) These
commands control the ability to read the timestamp counter
from userspace (the RDTSC instruction.)
While the RDTSC instuction is a useful profiling tool,
it is also the source of some non-determinism in ring-3.
For deterministic replay applications it is useful to be
able to trap and emulate (and record the outcome of) this
instruction.
This patch uses code earlier used to disable the timestamp
counter for the SECCOMP framework. A side-effect of this
patch is that the SECCOMP environment will now also disable
the timestamp counter on x86_64 due to the addition of the
TIF_NOTSC define on this platform.
The code which enables/disables the RDTSC instruction during
context switches is in the __switch_to_xtra function, which
already handles other unusual conditions, so normal
performance should not have to suffer from this change.
Signed-off-by: Erik Bosman <ejbosman@cs.vu.nl>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds prctl commands that make it possible
to deny the execution of timestamp counters in userspace.
If this is not implemented on a specific architecture,
prctl will return -EINVAL.
ned-off-by: Erik Bosman <ejbosman@cs.vu.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The kernel decompressor wrapper uses memory located beyond the
end of the image. This might lead to hard to debug problems,
but even if it can be proven to be safe, it is at the very
least unclean. I don't see any advantages either, unless you
count it not being zeroed out as an advantage. This patch
moves the boot-heap area to the bss segment.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
irqs_disabled() uses flags internally, use _flags to avoid shadowing
code calling into this macro.
Introduced between 2.6.25-rc3 and -rc4
Fixes the sparse warning:
arch/x86/mm/pageattr.c:383:21: warning: symbol 'flags' shadows an earlier one
arch/x86/mm/pageattr.c:369:16: originally declared here
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make x86 EFI code works when EFI_PAGE_SHIFT != PAGE_SHIFT. The
memrage_efi_to_native() provided in this patch can be used on other
EFI platform such as IA64 too.
This patch has been tested on Intel x86_64 platform with EFI 64/32
firmware.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds a driver for the Quick Capture Interface on the PXA270.
It is based on the original driver from Intel, but has been re-worked
multiple times since then, now it also supports the V4L2 API.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Adds support for the generic GPIO lib to the EP93xx family. The gpio
handling code has been moved from core.c to a new file called gpio.c.
The GPIO based IRQ code has not been changed.
Signed-off-by: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This adds support for two more leds:
the wlan one (found in SL-6000W and SL-6000L) and
the blutooth one (found in SL-6000W).
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Now that scoop gpio's are converted to generic_gpio,
tosascoop_device and tosascoop_jc_device don't have
to be exported.
Also make tosa_gpio_* static
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Set up the IRQ line for the WM9713 device on the Zylonite.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Now as the scoop pins are covered by the generic gpio API,
we can use leds-gpio driver instead of special leds-tosa.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Convert set/reset_scoop_gpio to generic gpio calls.
This patch depends on the pxaficp_ir hooks patch.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Let platform do some specific initialisation and cleanup
things during pxaficp_ir probing and removing. E.g. this
can be usefull to request/free gpios used by the platform
to control the transceiver.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This driver will provide registers, clocks and GPIOs of
the HTC PASIC3 (AIC3) and PASIC2 (AIC2) chips to the
ds1wm and leds-pasic3 drivers.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch fixes misprint in definition of CICR1_RGBT_CONV in include/asm-arm/arch-pxa/pxa-regs.h
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
All magician devices I've encountered so far have featured the Toppoly
TD028STEB1 display, so the Samsung LTP280QV support is untested.
The power-on sequence is not correct because pxafb doesn't yet support
enabling the LCD controller in the middle of the it.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
needed for power management (audio, BT, charging, GSM, LCD, SD), GSM, flash and SD operation and audio routing.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
implemented in CPLD chips on several HTC devices.
The original driver was written by Kevin O'Connor, I have adapted it to
use gpiolib and made the bus/register widths configurable.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
PXA GPIO definitions were split from pxa-regs.h into pxa2xx-gpio.h.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch implements support for Gumstix-F flash, udc and mci. Fixes since the last time are:
- Steve Sakoman as maintainer
- cleanup for udc and mci setup
Signed-off-by: Jaya Kumar <jayakumar.lkml@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Keypad registers are now fully defined within pxa27x-keypad.c, no
need to keep those definitions in pxa-regs.h
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
also update the clk definitions in pxa27x and pxa3xx.
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Changes include:
1. rename MFP_LPM_WAKEUP_ENABLE into MFP_LPM_CAN_WAKEUP to indicate
the board capability of this pin to wakeup the system
2. add gpio_set_wake() and keypad_set_wake() to allow dynamically
enable/disable wakeup from GPIOs and keypad GPIO
* these functions are currently kept in mfp-pxa2xx.c due to their
dependency to the MFP configuration
3. pxa2xx_mfp_config() only gives early warning if MFP_LPM_CAN_WAKEUP
is set on incorrect pins
So that the GPIO's wakeup capability is now decided by the following:
a) processor's capability: (only those GPIOs which have dedicated
bits within PWER/PRER/PFER can wakeup the system), this is
initialized by pxa{25x,27x}_init_mfp()
b) board design decides:
- whether the pin is designed to wakeup the system (some of
the GPIOs are configured as other functions, which is not
intended to be a wakeup source), by OR'ing the pin config
with MFP_LPM_CAN_WAKEUP
- which edge the pin is designed to wakeup the system, this
may depends on external peripherals/connections, which is
totally board specific; this is indicated by MFP_LPM_EDGE_*
c) the corresponding device's (most likely the gpio_keys.c) wakeup
attribute:
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pin configuration on pxa{25x,27x} has now separated from generic GPIO
into dedicated mfp-pxa2xx.c by this patch. The name "mfp" is borrowed
from pxa3xx and is used here to alert the difference between the two
concepts: pin configuration and generic GPIOs. A GPIO can be called
a "GPIO" _only_ when the corresponding pin is configured so.
A pin configuration on pxa{25x,27x} is composed of:
- alternate function selection (or pin mux as commonly called)
- low power state or sleep state
- wakeup enabling from low power mode
The following MFP_xxx bit definitions in mfp.h are re-used:
- MFP_PIN(x)
- MFP_AFx
- MFP_LPM_DRIVE_{LOW, HIGH}
- MFP_LPM_EDGE_*
Selecting alternate function on pxa{25x, 27x} involves configuration
of GPIO direction register GPDRx, so a new bit and MFP_DIR_{IN, OUT}
are introduced. And pin configurations are defined by the following
two macros:
- MFP_CFG_IN : for input alternate functions
- MFP_CFG_OUT : for output alternate functions
Every configuration should provide a low power state if it configured
as output using MFP_CFG_OUT(). As a general guideline, the low power
state should be decided to minimize the overall power dissipation. As
an example, it is better to drive the pin as high level in low power
mode if the GPIO is configured as an active low chip select.
Pins configured as GPIO are defined by MFP_CFG_IN(). This is to avoid
side effects when it is firstly configured as output. The actual
direction of the GPIO is configured by gpio_direction_{input, output}
Wakeup enabling on pxa{25x, 27x} is actually GPIO based wakeup, thus
the device based enable_irq_wake() mechanism is not applicable here.
E.g. invoking enable_irq_wake() with a GPIO IRQ as in the following
code to enable OTG wakeup is by no means portable and intuitive, and
it is valid _only_ when GPIO35 is configured as USB_P2_1:
enable_irq_wake( gpio_to_irq(35) );
To make things worse, not every GPIO is able to wakeup the system.
Only a small number of them can, on either rising or falling edge,
or when level is high (for keypad GPIOs).
Thus, another new bit is introduced to indicate that the GPIO will
wakeup the system:
- MFP_LPM_WAKEUP_ENABLE
The following macros can be used in platform code, and be OR'ed to
the GPIO configuration to enable its wakeup:
- WAKEUP_ON_EDGE_{RISE, FALL, BOTH}
- WAKEUP_ON_LEVEL_HIGH
The WAKEUP_ON_LEVEL_HIGH is used for keypad GPIOs _only_, there is
no edge settings for those GPIOs.
These WAKEUP_ON_* flags OR'ed on wrong GPIOs will be ignored in case
that platform code author is careless enough.
The tradeoff here is that the wakeup source is fully determined by
the platform configuration, instead of enable_irq_wake().
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
two reasons:
1. GPIO namings and their mode definitions are conceptually not part
of the PXA register definitions
2. this is actually a temporary move in the transition of PXA2xx to
use MFP-alike APIs (as what PXA3xx is now doing), so that legacy
code will still work and new code can be added in step by step
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This makes the code better organized and simplified a bit. The change
will lose a bit of performance when performing IRQ ack/mask/unmask,but
that's not too much after checking the result binary.
This patch also removes the ugly #ifdef CONFIG_PXA27x .. #endif by
carefully not to access those pxa{27x,3xx} specific registers, this
is done by keeping an internal IRQ number variable. The pxa-regs.h
is also modified so registers for IRQ > PXA_IRQ(31) are made public
even if CONFIG_PXA{27x,3xx} isn't defined (for pxa25x's sake)
The incorrect assumption in the original code that internal irq starts
from 0 is also corrected by comparing with PXA_IRQ(0).
"struct sys_device" for the IRQ are reduced into one single device on
pxa{27x,3xx}.
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Enhanced GPIO alternate functions descriptions,
taken from Intel PXA270 Developers Manual.
Signed-off-by: Robert Jarzmik <rjarzmik@free.fr>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Expose control of the PXA3xx 13MHz CLK_POUT pin via the clock API
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
There have been a few oopses caused by 'struct file's with NULL f_vfsmnts.
There was also a set of potentially missed mnt_want_write()s from
dentry_open() calls.
This patch provides a very simple debugging framework to catch these kinds of
bugs. It will WARN_ON() them, but should stop us from having any oopses or
mnt_writer count imbalances.
I'm quite convinced that this is a good thing because it found bugs in the
stuff I was working on as soon as I wrote it.
[hch: made it conditional on a debug option.
But it's still a little bit too ugly]
[hch: merged forced remount r/o fix from Dave and akpm's fix for the fix]
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Originally from: Herbert Poetzl <herbert@13thfloor.at>
This is the core of the read-only bind mount patch set.
Note that this does _not_ add a "ro" option directly to the bind mount
operation. If you require such a mount, you must first do the bind, then
follow it up with a 'mount -o remount,ro' operation:
If you wish to have a r/o bind mount of /foo on bar:
mount --bind /foo /bar
mount -o remount,ro /bar
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This is the real meat of the entire series. It actually
implements the tracking of the number of writers to a mount.
However, it causes scalability problems because there can be
hundreds of cpus doing open()/close() on files on the same mnt at
the same time. Even an atomic_t in the mnt has massive scalaing
problems because the cacheline gets so terribly contended.
This uses a statically-allocated percpu variable. All want/drop
operations are local to a cpu as long that cpu operates on the same
mount, and there are no writer count imbalances. Writer count
imbalances happen when a write is taken on one cpu, and released
on another, like when an open/close pair is performed on two
Upon a remount,ro request, all of the data from the percpu
variables is collected (expensive, but very rare) and we determine
if there are any outstanding writers to the mount.
I've written a little benchmark to sit in a loop for a couple of
seconds in several cpus in parallel doing open/write/close loops.
http://sr71.net/~dave/linux/openbench.c
The code in here is a a worst-possible case for this patch. It
does opens on a _pair_ of files in two different mounts in parallel.
This should cause my code to lose its "operate on the same mount"
optimization completely. This worst-case scenario causes a 3%
degredation in the benchmark.
I could probably get rid of even this 3%, but it would be more
complex than what I have here, and I think this is getting into
acceptable territory. In practice, I expect writing more than 3
bytes to a file, as well as disk I/O to mask any effects that this
has.
(To get rid of that 3%, we could have an #defined number of mounts
in the percpu variable. So, instead of a CPU getting operate only
on percpu data when it accesses only one mount, it could stay on
percpu data when it only accesses N or fewer mounts.)
[AV] merged fix for __clear_mnt_mount() stepping on freed vfsmount
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If someone decides to demote a file from r/w to just
r/o, they can use this same code as __fput().
NFS does just that, and will use this in the next
patch.
AV: drop write access in __fput() only after we evict from file list.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Erez Zadok <ezk@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J Bruce Fields" <bfields@fieldses.org>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch adds two function mnt_want_write() and mnt_drop_write(). These are
used like a lock pair around and fs operations that might cause a write to the
filesystem.
Before these can become useful, we must first cover each place in the VFS
where writes are performed with a want/drop pair. When that is complete, we
can actually introduce code that will safely check the counts before allowing
r/w<->r/o transitions to occur.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
open_namei() will, in the future, need to take mount write counts
over its creation and truncation (via may_open()) operations. It
needs to keep these write counts until any potential filp that is
created gets __fput()'d.
This gets complicated in the error handling and becomes very murky
as to how far open_namei() actually got, and whether or not that
mount write count was taken. That makes it a bad interface.
All that the current do_filp_open() really does is allocate the
nameidata on the stack, then call open_namei().
So, this merges those two functions and moves filp_open() over
to namei.c so it can be close to its buddy: do_filp_open(). It
also gets a kerneldoc comment in the process.
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they (or some user of them) rely
on it dragging in some unrelated header file, but I can't build all
these files, so we'll have to fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
security: fix up documentation for security_module_enable
Security: Introduce security= boot parameter
Audit: Final renamings and cleanup
SELinux: use new audit hooks, remove redundant exports
Audit: internally use the new LSM audit hooks
LSM/Audit: Introduce generic Audit LSM hooks
SELinux: remove redundant exports
Netlink: Use generic LSM hook
Audit: use new LSM hooks instead of SELinux exports
SELinux: setup new inode/ipc getsecid hooks
LSM: Introduce inode_getsecid and ipc_getsecid hooks
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
[NET]: Fix and allocate less memory for ->priv'less netdevices
[IPV6]: Fix dangling references on error in fib6_add().
[NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
[PKT_SCHED]: Fix datalen check in tcf_simp_init().
[INET]: Uninline the __inet_inherit_port call.
[INET]: Drop the inet_inherit_port() call.
SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
[netdrvr] forcedeth: internal simplifications; changelog removal
phylib: factor out get_phy_id from within get_phy_device
PHY: add BCM5464 support to broadcom PHY driver
cxgb3: Fix __must_check warning with dev_dbg.
tc35815: Statistics cleanup
natsemi: fix MMIO for PPC 44x platforms
[TIPC]: Cleanup of TIPC reference table code
[TIPC]: Optimized initialization of TIPC reference table
[TIPC]: Remove inlining of reference table locking routines
e1000: convert uint16_t style integers to u16
ixgb: convert uint16_t style integers to u16
sb1000.c: make const arrays static
sb1000.c: stop inlining largish static functions
...
Add the security= boot parameter. This is done to avoid LSM
registration clashes in case of more than one bult-in module.
User can choose a security module to enable at boot. If no
security= boot parameter is specified, only the first LSM
asking for registration will be loaded. An invalid security
module name will be treated as if no module has been chosen.
LSM modules must check now if they are allowed to register
by calling security_module_enable(ops) first. Modify SELinux
and SMACK to do so.
Do not let SMACK register smackfs if it was not chosen on
boot. Smackfs assumes that smack hooks are registered and
the initial task security setup (swapper->security) is done.
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Rename the se_str and se_rule audit fields elements to
lsm_str and lsm_rule to avoid confusion.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Setup the new Audit LSM hooks for SELinux.
Remove the now redundant exported SELinux Audit interface.
Audit: Export 'audit_krule' and 'audit_field' to the public
since their internals are needed by the implementation of the
new LSM hook 'audit_rule_known'.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Introduce a generic Audit interface for security modules
by adding the following new LSM hooks:
audit_rule_init(field, op, rulestr, lsmrule)
audit_rule_known(krule)
audit_rule_match(secid, field, op, rule, actx)
audit_rule_free(rule)
Those hooks are only available if CONFIG_AUDIT is enabled.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Remove the following exported SELinux interfaces:
selinux_get_inode_sid(inode, sid)
selinux_get_ipc_sid(ipcp, sid)
selinux_get_task_sid(tsk, sid)
selinux_sid_to_string(sid, ctx, len)
They can be substitued with the following generic equivalents
respectively:
new LSM hook, inode_getsecid(inode, secid)
new LSM hook, ipc_getsecid*(ipcp, secid)
LSM hook, task_getsecid(tsk, secid)
LSM hook, sid_to_secctx(sid, ctx, len)
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Introduce inode_getsecid(inode, secid) and ipc_getsecid(ipcp, secid)
LSM hooks. These hooks will be used instead of similar exported
SELinux interfaces.
Let {inode,ipc,task}_getsecid hooks set the secid to 0 by default
if CONFIG_SECURITY is not defined or if the hook is set to
NULL (dummy). This is done to notify the caller that no valid
secid exists.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
This patch adds the base files for the PB1176 platform support.
Signed-off-by: Bahadir Balban <bahadir.balban@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch adds the base files for the PB11MPCore platform support.
Signed-off-by: Bahadir Balban <bahadir.balban@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch changes the IO_ADDRESS macro for the RealView platforms to
accomodate a wider range of physical addresses on PB11MPCore.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The upcoming PB11MPCore and PB1176 have different memory maps and some
of the definitions in platform.h are no longer common. This patch
moves them to the board-eb.h file and updates their usage in
realview_eb.c.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since the PB1176 has different UART base addresses, this patch moves
the definitions form platorm.h to board-eb.h. It also modifies
uncompress.h to detect the platform type at run-time.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch moves the timer definitions from platform.h into board-eb.h
as they are different on PB11MPCore and PB1176. It also adds
timerX_va_base variables in core.c which are set by the
realview_eb_timer_init function before invoking realview_timer_init.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch moves the patch definitions into board-eb.h and
realview_eb.c (from core.c) as they are different on the PB11MPCore
and PB1176 platforms.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This is in preparation for the RealView PB11MPCore and PB1176 patches
which have different base addresses for the GIC.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch moves the SCU initialisation from __v6_setup to the
smp_prepare_cpus() function as it relies on platform-specific
settings. Changes to get_core_count() are mainly for allowing cleaner
code with the upcoming PB11MPCore patches.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch adds a prefetch abort handler similar to the data abort one
and renames the latter for consistency. Initial implementation by Paul
Brook with some renaming by Catalin Marinas.
Signed-off-by: Paul Brook <paul@codesourcery.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (137 commits)
[SCSI] iscsi: bidi support for iscsi_tcp
[SCSI] iscsi: bidi support at the generic libiscsi level
[SCSI] iscsi: extended cdb support
[SCSI] zfcp: Fix error handling for blocked unit for send FCP command
[SCSI] zfcp: Remove zfcp_erp_wait from slave destory handler to fix deadlock
[SCSI] zfcp: fix 31 bit compile warnings
[SCSI] bsg: no need to set BSG_F_BLOCK bit in bsg_complete_all_commands
[SCSI] bsg: remove minor in struct bsg_device
[SCSI] bsg: use better helper list functions
[SCSI] bsg: replace kobject_get with blk_get_queue
[SCSI] bsg: takes a ref to struct device in fops->open
[SCSI] qla1280: remove version check
[SCSI] libsas: fix endianness bug in sas_ata
[SCSI] zfcp: fix compiler warning caused by poking inside new semaphore (linux-next)
[SCSI] aacraid: Do not describe check_reset parameter with its value
[SCSI] aacraid: Fix down_interruptible() to check the return value
[SCSI] sun3_scsi_vme: add MODULE_LICENSE
[SCSI] st: rename flush_write_buffer()
[SCSI] tgt: use KMEM_CACHE macro
[SCSI] initio: fix big endian problems for auto request sense
...
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (49 commits)
[GFS2] fix assertion in log_refund()
[GFS2] fix GFP_KERNEL misuses
[GFS2] test for IS_ERR rather than 0
[GFS2] Invalidate cache at correct point
[GFS2] fs/gfs2/recovery.c: suppress warnings
[GFS2] Faster gfs2_bitfit algorithm
[GFS2] Streamline quota lock/check for no-quota case
[GFS2] Remove drop of module ref where not needed
[GFS2] gfs2_adjust_quota has broken unstuffing code
[GFS2] possible null pointer dereference fixup
[GFS2] Need to ensure that sector_t is 64bits for GFS2
[GFS2] re-support special inode
[GFS2] remove gfs2_dev_iops
[GFS2] fix file_system_type leak on gfs2meta mount
[GFS2] Allow bmap to allocate extents
[GFS2] Fix a page lock / glock deadlock
[GFS2] proper extern for gfs2/locking/dlm/mount.c:gdlm_ops
[GFS2] gfs2/ops_file.c should #include "ops_inode.h"
[GFS2] be*_add_cpu conversion
[GFS2] Fix bug where we called drop_bh incorrectly
...