android_kernel_xiaomi_sm8350/Documentation
David Howells 201a15428b FS-Cache: Handle pages pending storage that get evicted under OOM conditions
Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache.  Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.

The problem is typified by the following trace of a stuck process:

	kslowd005     D 0000000000000000     0  4253      2 0x00000080
	 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
	 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
	Call Trace:
	 [<ffffffffa00782d8>] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffffa0078240>] ? __fscache_check_page_write+0x63/0x70 [fscache]
	 [<ffffffffa00b671d>] nfs_fscache_release_page+0x4e/0xc4 [nfs]
	 [<ffffffffa00927f0>] nfs_release_page+0x3c/0x41 [nfs]
	 [<ffffffff810885d3>] try_to_release_page+0x32/0x3b
	 [<ffffffff81093203>] shrink_page_list+0x316/0x4ac
	 [<ffffffff8109372b>] shrink_inactive_list+0x392/0x67c
	 [<ffffffff813532fa>] ? __mutex_unlock_slowpath+0x100/0x10b
	 [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130
	 [<ffffffff8135330e>] ? mutex_unlock+0x9/0xb
	 [<ffffffff81093aa2>] shrink_list+0x8d/0x8f
	 [<ffffffff81093d1c>] shrink_zone+0x278/0x33c
	 [<ffffffff81052d6c>] ? ktime_get_ts+0xad/0xba
	 [<ffffffff81094b13>] try_to_free_pages+0x22e/0x392
	 [<ffffffff81091e24>] ? isolate_pages_global+0x0/0x212
	 [<ffffffff8108e743>] __alloc_pages_nodemask+0x3dc/0x5cf
	 [<ffffffff81089529>] grab_cache_page_write_begin+0x65/0xaa
	 [<ffffffff8110f8c0>] ext3_write_begin+0x78/0x1eb
	 [<ffffffff81089ec5>] generic_file_buffered_write+0x109/0x28c
	 [<ffffffff8103cb69>] ? current_fs_time+0x22/0x29
	 [<ffffffff8108a509>] __generic_file_aio_write+0x350/0x385
	 [<ffffffff8108a588>] ? generic_file_aio_write+0x4a/0xae
	 [<ffffffff8108a59e>] generic_file_aio_write+0x60/0xae
	 [<ffffffff810b2e82>] do_sync_write+0xe3/0x120
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810b18e1>] ? __dentry_open+0x1a5/0x2b8
	 [<ffffffff810b1a76>] ? dentry_open+0x82/0x89
	 [<ffffffffa00e693c>] cachefiles_write_page+0x298/0x335 [cachefiles]
	 [<ffffffffa0077147>] fscache_write_op+0x178/0x2c2 [fscache]
	 [<ffffffffa0075656>] fscache_op_execute+0x7a/0xd1 [fscache]
	 [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1
	 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308
	 [<ffffffff8104be91>] kthread+0x7a/0x82
	 [<ffffffff8100beda>] child_rip+0xa/0x20
	 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
	 [<ffffffff8102ef83>] ? tg_shares_up+0x171/0x227
	 [<ffffffff8104be17>] ? kthread+0x0/0x82
	 [<ffffffff8100bed0>] ? child_rip+0x0/0x20

In the above backtrace, the following is happening:

 (1) A page storage operation is being executed by a slow-work thread
     (fscache_write_op()).

 (2) FS-Cache farms the operation out to the cache to perform
     (cachefiles_write_page()).

 (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
     standard write (do_sync_write()) under KERNEL_DS directly from the netfs
     page.

 (4) However, for Ext3 to perform the write, it must allocate some memory, in
     particular, it must allocate at least one page cache page into which it
     can copy the data from the netfs page.

 (5) Under OOM conditions, the memory allocator can't immediately come up with
     a page, so it uses vmscan to find something to discard
     (try_to_free_pages()).

 (6) vmscan finds a clean netfs page it might be able to discard (possibly the
     one it's trying to write out).

 (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
     called with __GFP_WAIT, so the netfs decides to wait for the store to
     complete (__fscache_wait_on_page_write()).

 (8) This blocks a slow-work processing thread - possibly against itself.

The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.

To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed.  This means that some data won't make it into the
cache this time.  To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.

The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan".  There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.

What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages.  If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:35 +00:00
..
ABI Documentation: ABI: /sys/devices/system/cpu/cpu#/node 2009-10-30 14:59:53 -07:00
accounting Documentation/: fix warnings from -Wmissing-prototypes in HOSTCFLAGS 2009-09-23 07:39:28 -07:00
acpi
aoe
arm ARM: 5738/1: Correct TCM documentation 2009-10-01 16:26:16 +01:00
auxdisplay includecheck fix: Documentation, cfag12864b-example.c 2009-09-24 07:20:57 -07:00
blackfin
block
blockdev
cdrom
cgroups cgroups: update documentation of cgroups tasks and procs files 2009-10-08 07:36:39 -07:00
connector connector: Provide the sender's credentials to the callback 2009-10-02 10:54:01 -07:00
console
cpu-freq [CPUFREQ] update Doc for cpuinfo_cur_freq and scaling_cur_freq 2009-09-01 12:45:09 -04:00
cpuidle
cris
crypto async_tx: add support for asynchronous RAID6 recovery operations 2009-08-29 19:09:27 -07:00
development-process
device-mapper
DocBook Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-09-22 07:51:45 -07:00
driver-model
dvb V4L/DVB (12902): Documentation: synchronize documentation for Technisat cards 2009-09-19 00:14:32 -03:00
early-userspace
fault-injection
fb matroxfb: make CONFIG_FB_MATROX_MULTIHEAD=y mandatory 2009-09-23 07:39:56 -07:00
filesystems FS-Cache: Handle pages pending storage that get evicted under OOM conditions 2009-11-19 18:11:35 +00:00
firmware_class
frv
hwmon hwmon: enhance the sysfs API for power meters 2009-10-29 07:39:30 -07:00
i2c i2c-piix4: Modify code name SB900 to Hudson-2 2009-11-07 13:10:46 +01:00
i2o
ia64 Documentation/: fix warnings from -Wmissing-prototypes in HOSTCFLAGS 2009-09-23 07:39:28 -07:00
ide
infiniband IB: Fix typo in udev rule documentation 2009-10-07 15:35:55 -07:00
input Input: add new driver for Sentelic Finger Sensing Pad 2009-08-19 21:46:09 -07:00
ioctl drivers/char/uv_mmtimer.c: add memory mapped RTC driver for UV 2009-09-24 07:21:03 -07:00
isdn Documentation: expand isdn/INTERFACE.CAPI document 2009-10-06 22:20:51 -07:00
ja_JP
kbuild kbuild: introduce ld-option 2009-09-20 12:27:42 +02:00
kdump
ko_KR
kvm KVM: Document KVM_CAP_IRQCHIP 2009-09-10 10:46:55 +03:00
laptops Merge branch 'thinkpad-2.6.32-part2' into release 2009-09-26 01:08:55 -04:00
lguest virtio: let header files include virtio_ids.h 2009-10-22 16:39:28 +10:30
m68k
make
mips
misc-devices max6875: Discard obsolete detect method 2009-10-04 22:53:41 +02:00
mn10300
mtd
namespaces
netlabel
networking pktgen: Fix multiqueue handling 2009-10-04 21:08:54 -07:00
parisc
PCI PCI: document PCIe fundamental reset interfaces 2009-09-09 13:29:38 -07:00
pcmcia Documentation/: fix warnings from -Wmissing-prototypes in HOSTCFLAGS 2009-09-23 07:39:28 -07:00
power Merge git://git.infradead.org/battery-2.6 2009-09-23 10:11:08 -07:00
powerpc Merge git://git.infradead.org/mtd-2.6 2009-09-23 10:07:49 -07:00
pps
prctl
RCU rcu: Remove CONFIG_PREEMPT_RCU 2009-08-23 10:32:40 +02:00
s390 [S390] s390dbf: Add description for usage of "%s" in sprintf events 2009-09-11 10:29:53 +02:00
scheduler
scsi [SCSI] hptiop: Add RR44xx adapter support 2009-10-02 09:45:22 -05:00
serial
sh
sound ALSA: dummy - Fix descriptions of pcm_substreams parameter 2009-11-02 14:11:55 +01:00
sparc
spi spi: fix spelling of `automatically' in documentation 2009-09-23 07:39:44 -07:00
sysctl Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2009-09-24 07:53:22 -07:00
telephony
thermal thermal: sysfs-api.txt - document passive attribute for thermal zones 2009-11-05 18:11:18 -05:00
timers
trace tracing: Fix comment typo and documentation example 2009-10-24 11:07:50 +02:00
uml
usb USB: Fix sysfs paths in documentation 2009-09-23 06:46:41 -07:00
video4linux Documentation/: fix warnings from -Wmissing-prototypes in HOSTCFLAGS 2009-09-23 07:39:28 -07:00
vm Merge branch 'hostprogs-wmissing-prototypes' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux-misc 2009-11-17 09:14:49 -08:00
w1 ds2482: Discard obsolete detect method 2009-10-04 22:53:41 +02:00
watchdog Documentation/: fix warnings from -Wmissing-prototypes in HOSTCFLAGS 2009-09-23 07:39:28 -07:00
wimax
x86 USB: ehci-dbgp,documentation: Documentation updates for ehci-dbgp 2009-09-23 06:46:39 -07:00
zh_CN
00-INDEX Bluetooth: Add documentation for Marvell Bluetooth driver 2009-08-22 14:25:32 -07:00
applying-patches.txt
atomic_ops.txt
bad_memory.txt
basic_profiling.txt
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
btmrvl.txt Bluetooth: Add documentation for Marvell Bluetooth driver 2009-08-22 14:25:32 -07:00
BUG-HUNTING
c2port.txt
cachetlb.txt
Changes
CodingStyle
cpu-hotplug.txt
cpu-load.txt
cputopology.txt Documentation: ABI: /sys/devices/system/cpu/cpu#/ topology files 2009-10-30 14:59:52 -07:00
credentials.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt ieee1394: update URLs in debugging-via-ohci1394.txt 2009-10-03 09:28:11 +02:00
dell_rbu.txt
devices.txt
DMA-API.txt
DMA-attributes.txt
DMA-ISA-LPC.txt
DMA-mapping.txt
dmaengine.txt
dontdiff sparc: Kill PROM console driver. 2009-09-15 17:04:38 -07:00
dynamic-debug-howto.txt
edac.txt
eisa.txt
email-clients.txt
feature-removal-schedule.txt inotify: deprecate the inotify kernel interface 2009-10-18 15:49:38 -04:00
flexible-arrays.txt Update flex_arrays.txt 2009-10-15 07:25:20 -06:00
futex-requeue-pi.txt
gcov.txt trivial: fix typo in CONFIG_DEBUG_FS in gcov doc 2009-09-21 15:14:56 +02:00
gpio.txt gpiolib: allow poll() on value 2009-09-23 07:39:48 -07:00
highuid.txt
HOWTO
hw_random.txt
ics932s401
initrd.txt
intel_txt.txt
Intel-IOMMU.txt intel-iommu: Kill DMAR_BROKEN_GFX_WA option. 2009-09-19 09:37:23 -07:00
io_ordering.txt
io-mapping.txt
IO-mapping.txt
iostats.txt
IPMI.txt
IRQ-affinity.txt
IRQ.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kernel-doc-nano-HOWTO.txt kernel-doc: allow multi-line declaration purpose descriptions 2009-09-18 09:48:52 -07:00
kernel-docs.txt
kernel-parameters.txt x86: earlyprintk: Fix regression to handle serial,ttySn as 1 arg 2009-10-01 10:34:16 +02:00
keys-request-key.txt
keys.txt KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] 2009-09-02 21:29:22 +10:00
kmemcheck.txt
kmemleak.txt kmemleak: add clear command support 2009-09-08 16:36:08 +01:00
kobject.txt
kprobes.txt
kref.txt kref: double kref_put() in my_data_handler() 2009-09-18 09:48:52 -07:00
ldm.txt
leds-class.txt led: document sysfs interface 2009-08-28 15:21:12 -04:00
leds-lp3944.txt
local_ops.txt
lockdep-design.txt lockdep: Fix typos in documentation 2009-08-07 12:03:46 +02:00
lockstat.txt
logo.gif
logo.txt
magic-number.txt
Makefile
ManagementStyle
mca.txt
md.txt
memory-barriers.txt
memory-hotplug.txt
memory.txt Documentation/memory.txt: remove some very outdated recommendations 2009-09-22 07:17:26 -07:00
mono.txt
mutex-design.txt
nmi_watchdog.txt
nommu-mmap.txt
numastat.txt mm: fix NUMA accounting in numastat.txt 2009-09-22 07:17:39 -07:00
oops-tracing.txt
parport-lowlevel.txt
parport.txt
pi-futex.txt
pnp.txt
preempt-locking.txt
printk-formats.txt
prio_tree.txt
rbtree.txt
rfkill.txt
robust-futex-ABI.txt
robust-futexes.txt
rt-mutex-design.txt
rt-mutex.txt
rtc.txt rtc: add boot_timesource sysfs attribute 2009-09-23 07:39:46 -07:00
SAK.txt
SecurityBugs
SELinux.txt
serial-console.txt
sgi-ioc4.txt
sgi-visws.txt
slow-work.txt SLOW_WORK: Allow a requeueable work item to sleep till the thread is needed 2009-11-19 18:10:57 +00:00
SM501.txt
Smack.txt
sparse.txt
spinlocks.txt
stable_api_nonsense.txt
stable_kernel_rules.txt
SubmitChecklist
SubmittingDrivers
SubmittingPatches docs: update patch size in SubmittingPatches 2009-10-01 16:11:12 -07:00
svga.txt
sysfs-rules.txt
sysrq.txt sysrq, kdump: make sysrq-c consistent 2009-07-29 19:10:36 -07:00
tomoyo.txt
unaligned-memory-access.txt
unicode.txt
unshare.txt
VGA-softcursor.txt
vgaarbiter.txt PCI/VGA: add VGA arbitration documentation 2009-09-09 13:29:42 -07:00
video-output.txt
volatile-considered-harmful.txt
voyager.txt
zorro.txt