Now that it no longer does an RPC ping, lockd always ends up queueing
an RPC task for the GRANT_MSG callback. But, it also requeues the block
for later attempts. Since these are hard RPC tasks, if the client we're
calling back goes unresponsive the GRANT_MSG callbacks can stack up in
the RPC queue.
Fix this by making server-side RPC clients default to soft RPC tasks.
lockd requeues the block anyway, so this should be OK.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
It's currently possible for an unresponsive NLM client to completely
lock up a server's lockd. The scenario is something like this:
1) client1 (or a process on the server) takes a lock on a file
2) client2 tries to take a blocking lock on the same file and
awaits the callback
3) client2 goes unresponsive (plug pulled, network partition, etc)
4) client1 releases the lock
...at that point the server's lockd will try to queue up a GRANT_MSG
callback for client2, but first it requeues the block with a timeout of
30s. nlm_async_call will attempt to bind the RPC client to client2 and
will call rpc_ping. rpc_ping entails a sync RPC call and if client2 is
unresponsive it will take around 60s for that to time out. Once it times
out, it's already time to retry the block and the whole process repeats.
Once in this situation, nlmsvc_retry_blocked will never return until
the host starts responding again. lockd won't service new calls.
Fix this by skipping the RPC ping on NLM RPC clients. This makes
nlm_async_call return quickly when called.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
.. and I really need to call it something else. Maybe it is time to
bring back the weasel series, since weasels always make me feel good
about a kernel.
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (30 commits)
[ARM] constify function pointer tables
[ARM] 4823/1: AT91 section fix
[ARM] 4824/1: pxa: clear RDH bit after any reset
[ARM] pxa: remove debugging PM: printk
ARM: OMAP1: Misc clean-up
ARM: OMAP1: Update defconfigs for omap1
ARM: OMAP1: Palm Tungsten E board clean-up
ARM: OMAP1: Use I2C bus registration helper for omap1
ARM: OMAP1: Remove omap_sram_idle()
ARM: OMAP1: PM fixes for OMAP1
ARM: OMAP1: Use MMC multislot structures for Siemens SX1 board
ARM: OMAP1: Make omap1 use MMC multislot structures
ARM: OMAP1: Change the comments to C style
ARM: OMAP1: Make omap1 boards to use omap_nand_platform_data
ARM: OMAP: Add helper module for board specific I2C bus registration
ARM: OMAP: Add dmtimer support for OMAP3
ARM: OMAP: Pre-3430 clean-up for dmtimer.c
ARM: OMAP: Add DMA support for chaining and 3430
ARM: OMAP: Add 24xx GPIO debounce support
ARM: OMAP: Get rid of unnecessary ifdefs in GPIO code
...
We want to allow different implementations of pci_raw_ops for standard
and extended config space on x86. Rather than clutter generic code with
knowledge of this, we make pci_raw_ops private to x86 and use it to
implement the new raw interface -- raw_pci_read() and raw_pci_write().
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thanks to Loic Prylli <loic@myri.com>, who originally proposed
this idea.
Always using legacy configuration mechanism for the legacy config space
and extended mechanism (mmconf) for the extended config space is
a simple and very logical approach. It's supposed to resolve all
known mmconf problems. It still allows per-device quirks (tweaking
dev->cfg_size). It also allows to get rid of mmconf fallback code.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 8811930dc7 ("splice: missing user
pointer access verification") added the proper access_ok() calls to
copy_from_user_mmap_sem() which ensures we can copy the struct iovecs
from userspace to the kernel.
But we also must check whether we can access the actual memory region
pointed to by the struct iovec to fix the access checks properly.
Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Oliver Pinter <oliver.pntr@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 954415e33e ("[PKT_SCHED] ematch:
tcf_em_destroy robustness") removed a cast on em->data when
passing it to kfree(), but em->data is an integer type that can
hold pointers as well as other values so the cast is necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
hrtimer_nanosleep_restart() clears/restores restart_block->fn. This is
pointless and complicates its usage. Note that if sys_restart_syscall()
doesn't actually happen, we have a bogus "pending" restart->fn anyway,
this is harmless.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Pavel Emelyanov <xemul@sw.ru>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Toyo Abe <toyoa@mvista.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Spotted by Pavel Emelyanov and Alexey Dobriyan.
compat_sys_nanosleep() implicitly uses hrtimer_nanosleep_restart(), this can't
work. Make a suitable compat_nanosleep_restart() helper.
Introduced by commit c70878b4e0
hrtimer: hook compat_sys_nanosleep up to high res timer code
Also, set ->addr_limit = KERNEL_DS before doing hrtimer_nanosleep(), this func
was changed by the previous patch and now takes the "__user *" parameter.
Thanks to Ingo Molnar for fixing the bug in this patch.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Pavel Emelyanov <xemul@sw.ru>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Toyo Abe <toyoa@mvista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Spotted by Pavel Emelyanov and Alexey Dobriyan.
hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
the local variable which lives in the caller's stack frame. This means that
if sys_restart_syscall() actually happens and it is interrupted as well, we
don't update the user-space variable, but write into the already dead stack
frame.
Introduced by commit 04c227140f
hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier
Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.
Small problem remains. man 2 nanosleep states that *rtmp should be written if
nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
if nanosleep returns 0), but (with or without this patch) we can dirty *rem
even if nanosleep() returns 0.
NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
bugs. Fixed by the next patch.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Pavel Emelyanov <xemul@sw.ru>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Toyo Abe <toyoa@mvista.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
include/linux/hrtimer.h | 2 -
kernel/hrtimer.c | 51 +++++++++++++++++++++++++-----------------------
kernel/posix-timers.c | 14 +------------
3 files changed, 30 insertions(+), 37 deletions(-)
clocksource initialization and error accumulation. This corrects a 280ppm
drift seen on some systems using acpi_pm, and affects other clocksources as
well (likely to a lesser degree).
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
htb_requeue() enqueues skbs for which htb_classify() returns NULL.
This is wrong because such skbs could be handled by NET_CLS_ACT code,
and the decision could be different than earlier in htb_enqueue().
So htb_requeue() is changed to work and look more like htb_enqueue().
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces the explicit usage of the magic constant "1024"
with IP6_RT_PRIO_USER in the IPV6 tree.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc-3.4.4 on powerpc:
drivers/net/starfire.c:219: error: version causes a section type conflict
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Andrew Morton <akpm@linux-foundation.org>
gcc-3.4.4 on powerpc:
drivers/net/via-velocity.c:443: error: chip_info_table causes a section type conflict
on this one I had to remove the __devinitdata too. Don't know why.
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc-3.4.4 on powerpc:
drivers/net/natsemi.c:245: error: natsemi_pci_info causes a section type conflict
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc-3.4.4 on powerpc:
drivers/net/typhoon.c:137: error: version causes a section type conflict
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following warnings:
WARNING: drivers/isdn/hisax/built-in.o(.text+0x19723): Section mismatch in reference from the function ISACVersion() to the variable .devinit.data:ISACVer
WARNING: drivers/isdn/hisax/built-in.o(.text+0x2005b): Section mismatch in reference from the function setup_avm_a1_pcmcia() to the function .devinit.text:setup_isac()
ISACVer were only used from function annotated __devinit
so add same annotation to ISACVer.
One af the fererencing functions missed __devinit so add it
and kill an additional warning.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Karsten Keil <kkeil@suse.de>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following warnings:
WARNING: drivers/isdn/hisax/built-in.o(.text+0x722): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_teles3()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x72c): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_s0box()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x736): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_telespci()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x747): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_avm_pcipnp()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x74e): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_elsa()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x755): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_diva()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x75c): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_sedlbauer()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x763): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_netjet_s()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x76a): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_hfcpci()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x771): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_hfcsx()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x778): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_niccy()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x77f): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_bkm_a4t()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x786): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_sct_quadro()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x78d): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_gazel()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x794): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_w6692()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x79b): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_netjet_u()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x7a2): Section mismatch in reference from the function hisax_cs_setup_card() to the function .devinit.text:setup_enternow_pci()
checkcard() are the only user of hisax_cs_setup_card().
And checkcard is only used during init or when hot plugging
ISDN devices. So annotate hisax_cs_setup_card() with __devinit.
checkcard() is used by exported functions so it cannot be
annotated __devinit. Annotate it with __ref so modpost
ignore references to _devinit section.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Karsten Keil <kkeil@suse.de>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following warnings:
WARNING: drivers/isdn/hisax/built-in.o(.text+0x1b276): Section mismatch in reference from the function inithscxisac() to the function .devinit.text:clear_pending_isac_ints()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x1b286): Section mismatch in reference from the function inithscxisac() to the function .devinit.text:initisac()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x1fec7): Section mismatch in reference from the function AVM_card_msg() to the function .devinit.text:clear_pending_isac_ints()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x21669): Section mismatch in reference from the function AVM_card_msg() to the function .devinit.text:clear_pending_isac_ints()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x21671): Section mismatch in reference from the function AVM_card_msg() to the function .devinit.text:initisac()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x2991e): Section mismatch in reference from the function Sedl_card_msg() to the function .devinit.text:clear_pending_isac_ints()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x29936): Section mismatch in reference from the function Sedl_card_msg() to the function .devinit.text:initisac()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x2993e): Section mismatch in reference from the function Sedl_card_msg() to the function .devinit.text:initisar()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x2e026): Section mismatch in reference from the function NETjet_S_card_msg() to the function .devinit.text:clear_pending_isac_ints()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x2e02e): Section mismatch in reference from the function NETjet_S_card_msg() to the function .devinit.text:initisac()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x37813): Section mismatch in reference from the function BKM_card_msg() to the function .devinit.text:clear_pending_isac_ints()
WARNING: drivers/isdn/hisax/built-in.o(.text+0x37823): Section mismatch in reference from the function BKM_card_msg() to the function .devinit.text:initisac()
initisar(), initisac() and clear_pending_isac_ints()
were all used via a cardmsg fnction - which may be called
ouside __devinit context.
So remove the bogus __devinit annotation of the
above three functions to fix the warnings.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Karsten Keil <kkeil@suse.de>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Add new "development flag" to the ext4 filesystem
ext4: Don't panic in case of corrupt bitmap
ext4: allocate struct ext4_allocation_context from a kmem cache
JBD2: Clear buffer_ordered flag for barried IO request on success
ext4: Fix Direct I/O locking
ext4: Fix circular locking dependency with migrate and rm.
allow in-inode EAs on ext4 root inode
ext4: Fix null bh pointer dereference in mballoc
ext4: Don't set EXTENTS_FL flag for fast symlinks
JBD2: Use the incompat macro for testing the incompat feature.
jbd2: Fix reference counting on the journal commit block's buffer head
[PATCH] jbd: Remove useless loop when writing commit record
jbd2: Add error check to journal_wait_on_commit_record to avoid oops
Fix the following warning:
WARNING: drivers/isdn/hisax/built-in.o(.text+0x35818): Section mismatch in reference from the function hfcsx_card_msg() to the function .devinit.text:inithfcsx()
hfcsx_card_msg() may be called outside __devinit context.
Following the program logic is looks like the CARD_INIT branch
will only be taken under __devinit context but to be consistent
remove the __devinit annotation of inithfcsx() so we
do not mix non-__devinit and __devinit code.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Karsten Keil <kkeil@suse.de>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the code in tcf_em_tree_destroy more robust and cleaner:
* Don't need to cast pointer to kfree() or avoid passing NULL.
* After freeing the tree, clear the pointer to avoid possible problems
from repeated free.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A couple of functions in meta match don't need to be inline.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes the code use a good proc API and the text ~50 bytes shorter.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCPT already depends in INET, so this doesn't create additional
dependencies.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The seq files API disposes the caller of the difficulty of
checking file position, the length of data to produce and
the size of provided buffer.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mainly this removes ifdef-s from inside the ipsec_pfkey_init.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/pppol2tp.c: In function `pppol2tp_seq_tunnel_show':
drivers/net/pppol2tp.c:2295: warning: long long unsigned int format, __u64 arg (arg 4)
drivers/net/pppol2tp.c:2295: warning: long long unsigned int format, __u64 arg (arg 5)
drivers/net/pppol2tp.c:2295: warning: long long unsigned int format, __u64 arg (arg 6)
drivers/net/pppol2tp.c:2295: warning: long long unsigned int format, __u64 arg (arg 7)
drivers/net/pppol2tp.c:2295: warning: long long unsigned int format, __u64 arg (arg 8)
drivers/net/pppol2tp.c:2295: warning: long long unsigned int format, __u64 arg (arg 9)
drivers/net/pppol2tp.c: In function `pppol2tp_seq_session_show':
drivers/net/pppol2tp.c:2328: warning: long long unsigned int format, __u64 arg (arg 5)
drivers/net/pppol2tp.c:2328: warning: long long unsigned int format, __u64 arg (arg 6)
drivers/net/pppol2tp.c:2328: warning: long long unsigned int format, __u64 arg (arg 7)
drivers/net/pppol2tp.c:2328: warning: long long unsigned int format, __u64 arg (arg 8)
drivers/net/pppol2tp.c:2328: warning: long long unsigned int format, __u64 arg (arg 9)
drivers/net/pppol2tp.c:2328: warning: long long unsigned int format, __u64 arg (arg 10)
Not all platforms implement u64 with unsigned long long. eg: powerpc.
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc-3.4.4 on powerpc:
drivers/net/bnx2.c:67: error: version causes a section type conflict
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Andrew Morton <akpm@linux-foundation.org>
gcc-3.4.4 on powerpc:
drivers/net/bnx2x.c:73: error: version causes a section type conflict
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This flag is simply a generic "this is a crash/burn test filesystem"
marker. If it is set, then filesystem code which is "in development"
will be allowed to mount the filesystem. Filesystem code which is not
considered ready for prime-time will check for this flag, and if it is
not set, it will refuse to touch the filesystem.
As we start rolling ext4 out to distro's like Fedora, et. al, this makes
it less likely that a user might accidentally start using ext4 on a
production filesystem; a bad thing, since that will essentially make it
be unfsckable until e2fsprogs catches up.
Signed-off-by: Theodore Tso <tytso@MIT.EDU>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Multiblock allocator calls BUG_ON in many case if the free and used
blocks count obtained looking at the bitmap is different from what
the allocator internally accounted for. Use ext4_error in such case
and don't panic the system.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
struct ext4_allocation_context is rather large, and this bloats
the stack of many functions which use it. Allocating it from
a named slab cache will alleviate this.
For example, with this change (on top of the noinline patch sent earlier):
-ext4_mb_new_blocks 200
+ext4_mb_new_blocks 40
-ext4_mb_free_blocks 344
+ext4_mb_free_blocks 168
-ext4_mb_release_inode_pa 216
+ext4_mb_release_inode_pa 40
-ext4_mb_release_group_pa 192
+ext4_mb_release_group_pa 24
Most of these stack-allocated structs are actually used only for
mballoc history; and in those cases often a smaller struct would do.
So changing that may be another way around it, at least for those
functions, if preferred. For now, in those cases where the ac
is only for history, an allocation failure simply skips the history
recording, and does not cause any other failures.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In JBD2 jbd2_journal_write_commit_record(), clear the buffer_ordered
flag for the bh after barried IO has succeed. This prevents later, if
the same buffer head were submitted to the underlying device, which has
been reconfigured to not support barrier request, the JBD2 commit code
could treat it as a normal IO (without barrier).
This is a port from JBD/ext3 fix from Neil Brown.
More details from Neil:
Some devices - notably dm and md - can change their behaviour in
response to BIO_RW_BARRIER requests. They might start out accepting
such requests but on reconfiguration, they find out that they cannot
any more. JBD2 deal with this by always testing if BIO_RW_BARRIER
requests fail with EOPNOTSUPP, and retrying the write
requests without the barrier (probably after waiting for any pending
writes to complete).
However there is a bug in the handling this in JBD2 for ext4 .
When ext4/JBD2 to submit a BIO_RW_BARRIER request,
it sets the buffer_ordered flag on the buffer head.
If the request completes successfully, the flag STAYS SET.
Other code might then write the same buffer_head after the device has
been reconfigured to not accept barriers. This write will then fail,
but the "other code" is not ready to handle EOPNOTSUPP errors and the
error will be treated as fatal.
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We cannot start transaction in ext4_direct_IO() and just let it last
during the whole write because dio_get_page() acquires mmap_sem which
ranks above transaction start (e.g. because we have dependency chain
mmap_sem->PageLock->journal_start, or because we update atime while
holding mmap_sem) and thus deadlocks could happen. We solve the problem
by starting a transaction separately for each ext4_get_block() call.
We *could* have a problem that we allocate a block and before its data
are written out the machine crashes and thus we expose stale data. But
that does not happen because for hole-filling generic code falls back to
buffered writes and for file extension, we add inode to orphan list and
thus in case of crash, journal replay will truncate inode back to the
original size.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>