This patch adds support for systems that cannot receive every interrupt on a
single cpu simultaneously, in the check to see if we have enough HARDIRQ_BITS.
MAX_HARDIRQS_PER_CPU becomes the count of the maximum number of hardare
generated interrupts per cpu.
On architectures that support per cpu interrupt delivery this can be a
significant space savings and scalability bonus.
This patch adds support for systems that cannot receive every interrupt on
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Because of the nasty way that CONFIG_PCI_MSI was implemented we wound up with
set_irq_info and set_native_irq_info, with move_irq and move_native_irq. Both
functions did the same thing but they were built and called under different
circumstances. Now that the msi hacks are gone we can kill move_irq and
set_irq_info.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes the change in behavior of the irq allocation code when
CONFIG_PCI_MSI is defined. Removing all instances of the assumption that irq
== vector.
create_irq is rewritten to first allocate a free irq and then to assign that
irq a vector.
assign_irq_vector is made static and the AUTO_ASSIGN case which allocates an
vector not bound to an irq is removed.
The ioapic vector methods are removed, and everything now works with irqs.
The definition of NR_IRQS no longer depends on CONFIG_PCI_MSI
[akpm@osdl.org: cleanup]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes the change in behavior of the irq allocation code when
CONFIG_PCI_MSI is defined. Removing all instances of the assumption that irq
== vector.
create_irq is rewritten to first allocate a free irq and then to assign that
irq a vector.
assign_irq_vector is made static and the AUTO_ASSIGN case which allocates an
vector not bound to an irq is removed.
The ioapic vector methods are removed, and everything now works with irqs.
The definition of NR_IRQS no longer depends on CONFIG_PCI_MSI
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
After the previous changes ia64 is the only architecture useing msi-apic.c
[akpm@osdl.org: unbreak MSI on ia64]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes the hardcoded assumption that irq == vector in the msi
composition code, and it allows the msi message composition to setup logical
mode, or lowest priorirty delivery mode as we do for other apic interrupts,
and with the same selection criteria.
Basically this moves the problem of what is in the msi message into the
architecture irq management code where it belongs. Not in a generic layer
that doesn't have enough information to compose msi messages properly.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes the hardcoded assumption that irq == vector in the msi
composition code, and it allows the msi message composition to setup logical
mode, or lowest priorirty delivery mode as we do for other apic interrupts,
and with the same selection criteria.
Basically this moves the problem of what is in the msi message into the
architecture irq management code where it belongs. Not in a generic layer
that doesn't have enough information to compose msi messages properly.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The msi currently allocates irqs backwards. First it allocates a platform
dependent routing value for an interrupt the ``vector'' and then it figures
out from the vector which irq you are on.
For ia64 this is fine. For x86 and x86_64 this is complete nonsense and makes
an enourmous mess of the irq handling code and prevents some pretty
significant cleanups in the code for handling large numbers of irqs.
This patch refactors msi.c to work in terms of irqs and create_irq/destroy_irq
for dynamically managing irqs.
Hopefully this is finally a version of msi.c that is useful on more than just
x86 derivatives.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current implementation of create_irq() is a hack but it is the current
hack that msi.c uses, and unfortunately the ``generic'' apic msi ops depend on
this hack. Thus we are this hack of assuming irq == vector until the
depencencies in the generic irq code are removed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current implementation of create_irq() is a hack but it is the current
hack that msi.c uses, and unfortunately the ``generic'' apic msi ops depend on
this hack. Thus we are stuck this hack of assuming irq == vector until the
depencencies in the generic msi code are removed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With the msi support comes a new concept in irq handling, irqs that are
created dynamically at run time.
Currently the msi code allocates irqs backwards. First it allocates a
platform dependent routing value for an interrupt the ``vector'' and then it
figures out from the vector which irq you are on.
This msi backwards allocator suffers from two basic problems. The allocator
suffers because it is trying to do something that is architecture specific in
a generic way making it brittle, inflexible, and tied to tightly to the
architecture implementation. The alloctor also suffers from it's very
backwards nature as it has tied things together that should have no
dependencies.
To solve the basic dynamic irq allocation problem two new architecture
specific functions are added: create_irq and destroy_irq.
create_irq takes no input and returns an unused irq number, that won't be
reused until it is returned to the free poll with destroy_irq. The irq then
can be used for any purpose although the only initial consumer is the msi
code.
destroy_irq takes an irq number allocated with create_irq and returns it to
the free pool.
Making this functionality per architecture increases the simplicity of the irq
allocation code and increases it's flexibility.
dynamic_irq_init() and dynamic_irq_cleanup() are added to automate the
irq_desc initializtion that should happen for dynamic irqs.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently we attempt to predict how many irqs we will be able to allocate with
msi using pci_vector_resources and some complicated accounting, and then we
only allow each device as many irqs as we think are available on average.
Only the s2io driver even takes advantage of this feature all other drivers
have a fixed number of irqs they need and bail if they can't get them.
pci_vector_resources is inaccurate if anyone ever frees an irq. The whole
implmentation is racy. The current irq limit policy does not appear to make
sense with current drivers. So I have simplified things. We can revisit this
we we need a more sophisticated policy.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current msi_ops are short sighted in a number of ways, this patch attempts
to fix the glaring deficiences.
- Report in msi_ops if a 64bit address is needed in the msi message, so we
can fail 32bit only msi structures.
- Send and receive a full struct msi_msg in both setup and target. This is
a little cleaner and allows for architectures that need to modify the data
to retarget the msi interrupt to a different cpu.
- In target pass in the full cpu mask instead of just the first cpu in case
we can make use of the full cpu mask.
- Operate in terms of irqs and not vectors, currently there is still a 1-1
relationship but on architectures other than ia64 I expect this will change.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In support of this I also add a struct msi_msg that captures the the two
address and one data field ina typical msi message, and I remember the pos and
if the address is 64bit in struct msi_desc.
This makes the code a little more readable and easier to maintain, and paves
the way to further simplfications.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This allows the output of the msi tests to be stored directly in a bit field.
If you don't do this a value greater than one will be truncated and become 0.
Changing true to false with bizare consequences.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The problem. Because the disable routines leave the msi interrupts in all
sorts of half enabled states the enable routines become impossible to
implement correctly, and almost impossible to understand.
Simplifing this allows me to simply kill the buggy reroute_msix_table, and
generally makes the code more maintainable.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the latest changes the code for migrating x86_64 irqs was dropped. This
reads it in a fashion that will work even if we change the vector on level
triggered irqs when we migrate them.
[akpm@osdl.org: build fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently move_native_irq disables and renables the irq we are migrating to
ensure we don't take that irq when we are actually doing the migration
operation. Disabling the irq needs to happen but sometimes doing the work is
move_native_irq is too late.
On x86 with ioapics the irq move sequences needs to be:
edge_triggered:
mask irq.
move irq.
unmask irq.
ack irq.
level_triggered:
mask irq.
ack irq.
move irq.
unmask irq.
We can easily perform the edge triggered sequence, with the current defintion
of move_native_irq. However the level triggered case does not map well. For
that I have added move_masked_irq, to allow me to disable the irqs around both
the ack and the move.
Q: Why have we not seen this problem earlier?
A: The only symptom I have been able to reproduce is that if we change
the vector before acknowleding an irq the wrong irq is acknowledged.
Since we currently are not reprogramming the irq vector during
migration no problems show up.
We have to mask the irq before we acknowledge the irq or else we could
hit a window where an irq is asserted just before we acknowledge it.
Edge triggered irqs do not have this problem because acknowledgements
do not propogate in the same way.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The primary aim of this patchset is to remove maintenances problems caused by
the irq infrastructure. The two big issues I address are an artificially
small cap on the number of irqs, and that MSI assumes vector == irq. My
primary focus is on x86_64 but I have touched other architectures where
necessary to keep them from breaking.
- To increase the number of irqs I modify the code to look at the (cpu,
vector) pair instead of just looking at the vector.
With a large number of irqs available systems with a large irq count no
longer need to compress their irq numbers to fit. Removing a lot of brittle
special cases.
For acpi guys the result is that irq == gsi.
- Addressing the fact that MSI assumes irq == vector takes a few more
patches. But suffice it to say when I am done none of the generic irq code
even knows what a vector is.
In quick testing on a large Unisys x86_64 machine we stumbled over at least
one driver that assumed that NR_IRQS could always fit into an 8 bit number.
This driver is clearly buggy today. But this has become a class of bugs that
it is now much easier to hit.
This patch:
This is a minor space optimization. In practice I don't think this has any
affect because of our alignment constraints and the other fields but there is
not point in chewing up an uncessary word and since we already read the flag
field this should improve the cache hit ratio of the irq handler.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch converts all the i386 PIC controllers (except VisWS and Voyager,
which I could not test - but which should still work as old-style IRQ layers)
to the new and simpler irq-chip interrupt handling layer.
[akpm@osdl.org: build fix]
[mingo@elte.hu: enable fasteoi handler for i386 level-triggered IO-APIC irqs]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch converts all the x86_64 PIC controllers layers to the new and
simpler irq-chip interrupt handling layer.
[mingo@elte.hu: The patch also enables the fasteoi handler for x86_64]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
drivers/video/riva/fbdev.c: In function `riva_get_EDID_OF':
drivers/video/riva/fbdev.c:1846: warning: assignment discards qualifiers from pointer target type
This code is being bad: copying a pointer to read-only OF data into a
non-const pointer.
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
eCryptfs is a stacked cryptographic filesystem for Linux. It is derived from
Erez Zadok's Cryptfs, implemented through the FiST framework for generating
stacked filesystems. eCryptfs extends Cryptfs to provide advanced key
management and policy features. eCryptfs stores cryptographic metadata in the
header of each file written, so that encrypted files can be copied between
hosts; the file will be decryptable with the proper key, and there is no need
to keep track of any additional information aside from what is already in the
encrypted file itself.
[akpm@osdl.org: updates for ongoing API changes]
[bunk@stusta.de: cleanups]
[akpm@osdl.org: alpha build fix]
[akpm@osdl.org: cleanups]
[tytso@mit.edu: inode-diet updates]
[pbadari@us.ibm.com: generic_file_*_read/write() interface updates]
[rdunlap@xenotime.net: printk format fixes]
[akpm@osdl.org: make slab creation and teardown table-driven]
Signed-off-by: Phillip Hellewell <phillip@hellewell.homeip.net>
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
make headers_check fails on linux/nfsd/const.h.
Since linux/sunrpc/msg_prot.h does not seem to export anything interesting
for userspace, this patch moves it in the __KERNEL__ protected section.
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use all the pieces set up so far to implement referral support, allowing
return of NFS4ERR_MOVED and fs_locations attribute.
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Encode fs_locations attribute.
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define FS locations structures, some functions to manipulate them, and add
code to parse FS locations in downcall and add to the exports structure.
[bfields@fieldses.org: bunch of fixes and cleanups]
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Store the export path in the svc_export structure instead of storing only the
dentry. This will prevent the need for additional d_path calls to provide
NFSv4 fs_locations support.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a possible race in d_splice_alias. Though __d_find_alias(inode, 1)
will only return a dentry with DCACHE_DISCONNECTED set, it is possible for it
to get cleared before the BUG_ON, and it is is not possible to lock against
that.
There are a couple of problems here. Firstly, the code doesn't match the
comment. The comment describes a 'disconnected' dentry as being IS_ROOT as
well as DCACHE_DISCONNECTED, however there is not testing of IS_ROOT anythere.
A dentry is marked DCACHE_DISCONNECTED when allocated with d_alloc_anon, and
remains DCACHE_DISCONNECTED while a path is built up towards the root. So a
dentry can have a valid name and a valid parent and even grandparent, but will
still be DCACHE_DISCONNECTED until a path to the root is created. Once the
path to the root is complete, everything in the path gets DCACHE_DISCONNECTED
cleared. So the fact that DCACHE_DISCONNECTED isn't enough to say that a
dentry is free to be spliced in with a given name. This can only be allowed
if the dentry does not yet have a name, so the IS_ROOT test is needed too.
However even adding that test to __d_find_alias isn't enough. As
d_splice_alias drops dcache_lock before calling d_move to perform the splice,
it could race with another thread calling d_splice_alias to splice the inode
in with a different name in a different part of the tree (in the case where a
file has hard links). So that splicing code is only really safe for
directories (as we know that directories only have one link). For
directories, the caller of d_splice_alias will be holding i_mutex on the
(unique) parent so there is no room for a race.
A consequence of this is that a non-directory will never benefit from being
spliced into a pre-exisiting dentry, but that isn't a problem. It is
perfectly OK for a non-directory to have multiple dentries, some anonymous,
some not. And the comment for d_splice_alias says that it only happens for
directories anyway.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
totalram is measured in pages, not bytes, so PAGE_SHIFT must be used when
trying to find 1/4096 of RAM.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If nlm_lookup_host finds what it is looking for it exits with an extra
reference on the matching 'nsm' structure.
So don't actually count the reference until we are (fairly) sure it is going
to be used.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is legal to have zero-length NFSv4 acls; they just deny everything.
Also, nfs4_acl_nfsv4_to_posix will always return with pacl and dpacl set on
success, so the caller doesn't need to check this.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's no need to handle the case where the caller passes in null for pacl or
dpacl; no caller does that, because it would be a dumb thing to do.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can be a little more flexible about the flags allowed for inheritance (in
particular, we can deal with either the presence or the absence of
INHERIT_ONLY), but we should probably reject other combinations that we don't
understand.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use a different nfsv4->(draft posix) acl mapping which is
1. completely backwards compatible,
2. accepts any nfsv4 acl, and
3. errs on the side of restricting permissions.
In detail:
1. completely backwards compatible: The new mapping produces the
same result on any acl produced by the existing (draft
posix)->nfsv4 mapping; the one exception is that we no longer
attempt to guess the value of the mask by assuming certain denies
represent the mask. Since the server still keeps track of the mask
locally, sequences of chmod's will still be handled fine; the only
thing this will change is sequences of chmod's with intervening
read-modify-writes of the acl. That last case just isn't worth the
trouble and the possible misrepresentations of the user's intent
(if we guess that a certain deny indicates masking is in effect
when it really isn't).
2. accepts any nfsv4 acl: That's not quite true: we still reject
acls that use combinations of inheritance flags that we don't
support. We also reject acls that attempt to explicitly deny
read_acl or read_attributes permissions, or that attempt to deny
write_acl or write_attributes permissions to the owner of the file.
3. errs on the side of restricting permissions: one exception to
this last rule: we totally ignore some bits (write_owner,
synchronize, read_named_attributes, etc.) that are completely alien
to our filesystem semantics, in some cases even if that would mean
ignoring an explicit deny that we have no intention of enforcing.
Excepting that, the posix acl produced should be the most
permissive acl that is not more permissive than the given nfsv4
acl.
And the new code's shorter, too. Neato.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The previous patch enables some minor simplification here.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We could be using more common code in exp_pseudoroot(). This will also
simplify some changes we need to make later.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The rpc reply has multiple levels of error returns. The code here contributes
to the confusion by using "accept_statp" for a pointer to what the rfc (and
wireshark, etc.) refer to as the "reply_stat". (The confusion is compounded
by the fact that the rfc also has an "accept_stat" which follows the
reply_stat in the succesful case.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the request is denied after gss_accept was called, we shouldn't try to wrap
the reply. We were checking the accept_stat but not the reply_stat.
To check the reply_stat in _release, we need a pointer to before (rather than
after) the verifier, so modify body_start appropriately.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Factor out some common code from the integrity and privacy cases.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Both the (recently introduces) nsm_sema and the older f_sema are converted
over.
Cc: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The NFSACL patches introduced support for multiple RPC services listening on
the same transport. However, only the first of these services was registered
with portmapper. This was perfectly fine for nfsacl, as you traditionally do
not want these to show up in a portmapper listing.
The patch below changes the default behavior to always register all services
listening on a given transport, but retains the old behavior for nfsacl
services.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nlmclnt_recovery would try to force a portmap rebind by setting
host->h_nextrebind to 0. The right thing to do here is to set it to the
current time.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Every NLM call includes the client's NSM state. Currently, the Linux client
always reports 0 - which seems not to cause any problems, but is not what the
protocol says.
This patch exposes the kernel's internal variable to user space via a sysctl,
which can be set at system boot time by statd.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we send a GRANTED_MSG call, we current copy the NLM cookie provided in
the original LOCK call - because in 1996, some broken clients seemed to rely
on this bug. However, this means the cookies are not unique, so that when the
client's GRANTED_RES message comes back, we cannot simply match it based on
the cookie, but have to use the client's IP address in addition. Which breaks
when you have a multi-homed NFS client.
The X/Open spec explicitly mentions that clients should not expect the same
cookie; so one may hope that any clients that were broken in 1996 have either
been fixed or rendered obsolete.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The way we incremented the NLM cookie in nlmclnt_next_cookie was not thread
safe. This patch changes the counter to an atomic_t
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the nsm_use_hostnames sysctl and module param. If set, lockd
will use the client's name (as given in the NLM arguments) to find the NSM
handle. This makes recovery work when the NFS peer is multi-homed, and the
reboot notification arrives from a different IP than the original lock calls.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As a result of previous patches, the loop in nlmsvc_invalidate_all just sets
h_expires for all client/hosts to 0 (though does it in a very complicated
way).
This was possibly meant to trigger early garbage collection but half the time
'0' is in the future and so it infact delays garbage collection.
Pre-aging the 'hosts' is not really needed at this point anyway so we throw
out the loop and nlm_find_client which is no longer needed.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch moves the host destruction code out of nlm_host_gc into a function
of its own.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>