android_kernel_xiaomi_sm8350/include/asm-x86
Nick Piggin 8174c430e4 x86: lockless get_user_pages_fast()
Implement get_user_pages_fast without locking in the fastpath on x86.

Do an optimistic lockless pagetable walk, without taking mmap_sem or any
page table locks or even mmap_sem.  Page table existence is guaranteed by
turning interrupts off (combined with the fact that we're always looking
up the current mm, means we can do the lockless page table walk within the
constraints of the TLB shootdown design).  Basically we can do this
lockless pagetable walk in a similar manner to the way the CPU's pagetable
walker does not have to take any locks to find present ptes.

This patch (combined with the subsequent ones to convert direct IO to use
it) was found to give about 10% performance improvement on a 2 socket 8
core Intel Xeon system running an OLTP workload on DB2 v9.5

 "To test the effects of the patch, an OLTP workload was run on an IBM
  x3850 M2 server with 2 processors (quad-core Intel Xeon processors at
  2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel.  Comparing
  runs with and without the patch resulted in an overall performance
  benefit of ~9.8%.  Correspondingly, oprofiles showed that samples from
  __up_read and __down_read routines that is seen during thread contention
  for system resources was reduced from 2.8% down to .05%.  Monitoring the
  /proc/vmstat output from the patched run showed that the counter for
  fast_gup contained a very high number while the fast_gup_slow value was
  zero."

(fast_gup is the old name for get_user_pages_fast, fast_gup_slow is a
counter we had for the number of times the slowpath was invoked).

The main reason for the improvement is that DB2 has multiple threads each
issuing direct-IO.  Direct-IO uses get_user_pages, and thus the threads
contend the mmap_sem cacheline, and can also contend on page table locks.

I would anticipate larger performance gains on larger systems, however I
think DB2 uses an adaptive mix of threads and processes, so it could be
that thread contention remains pretty constant as machine size increases.
In which case, we stuck with "only" a 10% gain.

The downside of using get_user_pages_fast is that if there is not a pte
with the correct permissions for the access, we end up falling back to
get_user_pages and so the get_user_pages_fast is a bit of extra work.
However this should not be the common case in most performance critical
code.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: Kconfig fix]
[akpm@linux-foundation.org: Makefile fix/cleanup]
[akpm@linux-foundation.org: warning fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:06 -07:00
..
mach-bigsmp x86: APIC: remove apic_write_around(); use alternatives 2008-07-18 12:51:21 +02:00
mach-default x86: convert Dprintk to pr_debug 2008-07-21 21:35:38 +02:00
mach-es7000 x86: APIC: remove apic_write_around(); use alternatives 2008-07-18 12:51:21 +02:00
mach-generic x86: extend and use x86_quirks to clean up NUMAQ code 2008-07-20 09:25:52 +02:00
mach-numaq x86: remove unused file after numaq etc depends on genericarch 2008-07-08 10:38:54 +02:00
mach-rdc321x
mach-summit x86: APIC: remove apic_write_around(); use alternatives 2008-07-18 12:51:21 +02:00
mach-voyager Merge branch 'generic-ipi' into generic-ipi-for-linus 2008-07-15 21:55:59 +02:00
uv x86 BIOS interface for RTC on SGI UV 2008-07-18 14:35:14 +02:00
visws x86, VisWS: turn into generic arch, move definitions 2008-07-10 18:55:40 +02:00
xen x86: rename PTE_MASK to PTE_PFN_MASK 2008-07-22 10:43:44 +02:00
a.out-core.h
a.out.h
acpi.h x86: use acpi_numa_init to parse on 32-bit numa 2008-07-08 10:38:47 +02:00
agp.h
alternative-asm.h
alternative.h ftrace: use nops instead of jmp 2008-05-23 20:33:28 +02:00
amd_iommu_types.h x86, AMD IOMMU: replace DEVID macro with a function 2008-07-11 18:01:18 +02:00
amd_iommu.h x86, AMD IOMMU: add amd_iommu.h to export functions to the generic x86 dma code 2008-06-27 10:12:21 +02:00
apic.h x86: convert Dprintk to pr_debug 2008-07-21 21:35:38 +02:00
apicdef.h
arch_hooks.h x86: add ->pre_time_init to x86_quirks 2008-07-20 09:25:52 +02:00
asm.h x86: use macros from asm.h. 2008-07-09 09:14:12 +02:00
atomic_32.h
atomic_64.h Merge branch 'x86/uv' into x86/devel 2008-07-08 12:24:13 +02:00
atomic.h
auxvec.h
bios_ebda.h x86: extract common part of head32.c and head64.c into head.c 2008-06-05 15:10:02 +02:00
bitops.h x86, cleanup: fix description of __fls(): __fls(0) is undefined 2008-07-18 14:32:38 +02:00
boot.h x86: cleanup boot-heap usage 2008-04-19 19:19:54 +02:00
bootparam.h x86: move reserve_setup_data to setup.c 2008-07-08 13:16:14 +02:00
bug.h
bugs.h
byteorder.h
cache.h
cacheflush.h
calgary.h
calling.h x86 ptrace: unify syscall tracing 2008-07-16 12:15:17 -07:00
checksum_32.h
checksum_64.h
checksum.h
cmpxchg_32.h
cmpxchg_64.h x86, 64-bit: add sync_cmpxchg 2008-07-08 13:10:58 +02:00
cmpxchg.h
compat.h
cpu.h
cpufeature.h x86: APIC: remove apic_write_around(); use alternatives 2008-07-18 12:51:21 +02:00
cputime.h
current.h x86: unify current.h 2008-05-25 08:58:35 +02:00
debugreg.h
delay.h x86: delay lib unification build fix 2008-07-09 09:13:59 +02:00
desc_defs.h x86, 64-bit: add gate_offset() and gate_segment() macros 2008-07-08 13:10:28 +02:00
desc.h Merge commit 'v2.6.26' into x86/core 2008-07-14 11:37:46 +02:00
device.h dma-mapping: add the device argument to dma_mapping_error() 2008-07-26 12:00:03 -07:00
div64.h remove div_long_long_rem 2008-05-01 08:03:58 -07:00
dma-mapping.h dma-mapping: add the device argument to dma_mapping_error() 2008-07-26 12:00:03 -07:00
dma.h
dmi.h x86: use acpi_numa_init to parse on 32-bit numa 2008-07-08 10:38:47 +02:00
ds.h
dwarf2.h x86: Fix compile error with CONFIG_AS_CFI=n 2008-07-15 15:30:29 +02:00
e820.h x86: seperate memtest from init_64.c 2008-07-18 14:10:27 +02:00
edac.h
efi.h x86: reserve EFI memory map with reserve_early 2008-06-05 15:10:02 +02:00
elf.h x86/paravirt, 64-bit: make load_gs_index() a paravirt operation 2008-07-08 13:15:58 +02:00
emergency-restart.h
errno.h
fb.h
fcntl.h
fixmap_32.h x86: i386: reduce boot fixmap space 2008-07-18 16:17:52 -07:00
fixmap_64.h x86: make 64bit hpet_set_mapping to use ioremap too, v2 2008-07-14 09:24:17 +02:00
fixmap.h x86/paravirt/xen: add set_fixmap pv_mmu_ops 2008-06-20 15:09:56 +02:00
floppy.h
frame.h
ftrace.h ftrace: copy + paste typo in asm/ftrace.h 2008-07-18 13:14:08 +02:00
futex.h asm-*/futex.h should include linux/uaccess.h 2008-04-30 08:29:52 -07:00
gart.h x86: make only GART code include gart.h 2008-07-11 11:00:54 +02:00
genapic_32.h x86: I/O APIC: remove an IRQ2-mask hack 2008-07-13 11:43:48 +02:00
genapic_64.h x86: I/O APIC: remove an IRQ2-mask hack 2008-07-13 11:43:48 +02:00
genapic.h
geode.h x86, geode: add a VSA2 ID for General Software 2008-06-19 14:19:03 +02:00
gpio.h gpiolib: allow user-selection 2008-07-25 10:53:30 -07:00
hardirq_32.h
hardirq_64.h
hardirq.h x86: make /proc/stat account for all interrupts 2008-05-25 07:11:49 +02:00
highmem.h x86: kill bad_ppro 2008-07-08 10:38:19 +02:00
hpet.h x86: merge tsc calibration 2008-07-09 07:43:25 +02:00
hugetlb.h hugetlb: modular state for hugetlb page size 2008-07-24 10:47:17 -07:00
hw_irq.h Merge branch 'generic-ipi' into generic-ipi-for-linus 2008-07-15 21:55:59 +02:00
hypertransport.h
i387.h x86-64: Clean up 'save/restore_i387()' usage 2008-07-24 16:12:40 -07:00
i8253.h
i8259.h x86: automatical unification of i8259.c 2008-05-24 16:44:26 +02:00
ia32_unistd.h
ia32.h
idle.h
intel_arch_perfmon.h
io_32.h access_process_vm device memory infrastructure 2008-07-24 10:47:15 -07:00
io_64.h access_process_vm device memory infrastructure 2008-07-24 10:47:15 -07:00
io_apic.h x86: let setup_arch call init_apic_mappings for 32bit 2008-07-08 13:16:04 +02:00
io.h - x86: move early_ioremap prototypes to io.h 2008-07-08 13:16:12 +02:00
ioctl.h
ioctls.h
iommu.h x86 calgary: fix handling of devices that aren't behind the Calgary 2008-07-26 12:00:03 -07:00
ipcbuf.h
ipi.h Merge branch 'linus' into cpus4096 2008-07-16 00:29:07 +02:00
irq_regs_32.h
irq_regs_64.h
irq_regs.h
irq_vectors.h Merge branch 'generic-ipi' into generic-ipi-for-linus 2008-07-15 21:55:59 +02:00
irq.h x86: unify irq.h 2008-05-12 21:28:05 +02:00
irqflags.h Merge branch 'auto-ftrace-next' into tracing/for-linus 2008-07-14 16:11:52 +02:00
ist.h
k8.h
Kbuild Merge git://git.infradead.org/~dwmw2/random-2.6 2008-07-25 12:01:37 -07:00
kdebug.h x86: mmiotrace full patch, preview 1 2008-05-24 11:22:12 +02:00
kexec.h kexec jump 2008-07-26 12:00:04 -07:00
kgdb.h
kmap_types.h
kprobes.h
kvm_host.h KVM: fix exception entry / build bug, on 64-bit 2008-07-21 11:03:32 +02:00
kvm_para.h x86: KVM guest: Add memory clobber to hypercalls 2008-07-06 11:05:18 +03:00
kvm_x86_emulate.h KVM: x86 emulator: lazily evaluate segment registers 2008-07-20 12:42:35 +03:00
kvm.h KVM: SVM: add tracing support for TDP page faults 2008-07-20 12:40:48 +03:00
ldt.h
lguest_hcall.h
lguest.h
linkage.h
local.h
math_emu.h
mc146818rtc.h
mca_dma.h
mca.h
mce.h
mman.h
mmconfig.h fix build bug in "x86: add PCI extended config space access for AMD Barcelona" 2008-06-10 12:32:53 +02:00
mmu_context_32.h x86: unify mmu_context.h 2008-07-08 13:10:31 +02:00
mmu_context_64.h x86: unify mmu_context.h 2008-07-08 13:10:31 +02:00
mmu_context.h x86: unify mmu_context.h 2008-07-08 13:10:31 +02:00
mmu.h
mmx.h
mmzone_32.h x86: make generic arch support NUMAQ 2008-06-10 11:34:42 +02:00
mmzone_64.h
mmzone.h
module.h
mpspec_def.h x86: increase MAX_APICS for very large x86-64 configs 2008-07-08 12:23:29 +02:00
mpspec.h Merge branch 'x86/uv' into x86/devel 2008-07-08 12:24:13 +02:00
msgbuf.h
msidef.h
msr-index.h x86: cleanup C1E enabled detection 2008-06-10 15:52:07 +02:00
msr.h x86: add memory barriers to wrmsr 2008-07-08 13:10:24 +02:00
mtrr.h
mutex_32.h
mutex_64.h
mutex.h
namei.h
nmi.h x86: nmi_watchdog - introduce nmi_watchdog_active() helper 2008-07-08 12:51:42 +02:00
nops.h
numa_32.h x86: introduce init_memory_mapping for 32bit #3 2008-07-08 13:10:33 +02:00
numa_64.h x86: introduce initmem_init for 64 bit 2008-07-08 12:50:14 +02:00
numa.h
numaq.h x86: fix numaq_tsc_disable calling 2008-07-13 08:19:45 +02:00
olpc.h x86: olpc: add One Laptop Per Child architecture support 2008-04-29 08:06:07 -07:00
page_32.h x86: merge zones_sizes_init for numa and non numa on 32-bit 2008-07-08 13:16:22 +02:00
page_64.h x86: map UV chipset space - pagetable 2008-07-09 07:43:23 +02:00
page.h PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures 2008-07-24 10:47:21 -07:00
param.h
paravirt.h x86: rename PTE_MASK to PTE_PFN_MASK 2008-07-22 10:43:44 +02:00
parport.h
pat.h x86: rename pat_wc_enabled to pat_enabled 2008-06-12 10:14:27 +02:00
pci_32.h x86: add compilation checks to pci_unmap_*() macros 2008-06-30 12:22:01 +02:00
pci_64.h x86: reserve dma32 early for gart 2008-04-19 19:19:55 +02:00
pci-direct.h PCI/x86: early dump pci conf space v2 2008-06-10 10:59:52 -07:00
pci.h x86: move pci_routirq declaration to pci.h 2008-07-08 09:13:08 +02:00
pda.h x86: remove static boot_cpu_pda array v2 2008-07-08 11:31:25 +02:00
percpu.h x86_64: add workaround for no %gs-based percpu 2008-07-16 10:58:13 +02:00
pgalloc.h x86/paravirt: add a pgd_alloc/free hooks 2008-07-08 13:11:01 +02:00
pgtable_32.h x86: add PTE_FLAGS_MASK 2008-07-22 10:43:45 +02:00
pgtable_64.h x86: rename PTE_MASK to PTE_PFN_MASK 2008-07-22 10:43:44 +02:00
pgtable-2level-defs.h
pgtable-2level.h
pgtable-3level-defs.h
pgtable-3level.h x86: rename PTE_MASK to PTE_PFN_MASK 2008-07-22 10:43:44 +02:00
pgtable.h x86: implement pte_special 2008-07-26 12:00:05 -07:00
poll.h
posix_types_32.h
posix_types_64.h
posix_types.h fix asm-x86/{posix_types,unistd}.h 2008-04-26 17:35:46 +02:00
prctl.h
processor-cyrix.h
processor-flags.h x86: fix header export, asm-x86/processor-flags.h, CONFIG_* leaks 2008-07-24 12:49:53 +02:00
processor.h Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-23 18:37:44 -07:00
proto.h x86: add flags parameter to reserve_bootmem_generic() 2008-07-08 11:49:49 +02:00
ptrace-abi.h x86 ptrace: unify syscall tracing 2008-07-16 12:15:17 -07:00
ptrace.h x86: break mutual header inclusion 2008-06-02 12:48:23 +02:00
pvclock-abi.h x86: Add structs and functions for paravirt clocksource 2008-06-24 21:02:31 +03:00
pvclock.h x86: Add structs and functions for paravirt clocksource 2008-06-24 21:02:31 +03:00
reboot_fixups.h
reboot.h x86: constify data in reboot.c 2008-05-25 08:58:30 +02:00
required-features.h x86, 64-bit: PSE no longer a hard requirement 2008-07-08 13:11:08 +02:00
resource.h
resume-trace.h x86: move tracedata to RODATA 2008-05-25 07:09:47 +02:00
rio.h x86: remove duplicate get_bios_ebda() from rio.h 2008-04-26 17:35:47 +02:00
rtc.h
rwlock.h
rwsem.h
scatterlist.h x86: use dma_length in i386 2008-04-19 19:19:57 +02:00
seccomp_32.h x86: fix incomplete include guard in include/asm-x86/seccomp_32.h 2008-06-02 12:45:28 +02:00
seccomp_64.h x86: more header fixes 2008-06-18 12:27:03 +02:00
seccomp.h
sections.h
segment.h x86: unify and correct the GDT_ENTRY() macro 2008-07-17 11:29:24 -07:00
sembuf.h
serial.h
setup.h Merge branches 'x86/urgent', 'x86/amd-iommu', 'x86/apic', 'x86/cleanups', 'x86/core', 'x86/cpu', 'x86/fixmap', 'x86/gart', 'x86/kprobes', 'x86/memtest', 'x86/modules', 'x86/nmi', 'x86/pat', 'x86/reboot', 'x86/setup', 'x86/step', 'x86/unify-pci', 'x86/uv', 'x86/xen' and 'xen-64bit' into x86/for-linus 2008-07-21 16:37:17 +02:00
shmbuf.h
shmparam.h
sigcontext32.h
sigcontext.h
siginfo.h
signal.h Fix typos from signal_32/64.h merge 2008-07-18 17:59:13 +02:00
smp.h x86_64: unstatic get_local_pda 2008-07-16 10:55:07 +02:00
socket.h
sockios.h
sparsemem.h
spinlock_types.h x86/paravirt: add hooks for spinlock operations 2008-07-16 11:15:52 +02:00
spinlock.h paravirt: introduce a "lock-byte" spinlock implementation 2008-07-16 11:15:53 +02:00
srat.h x86: remove acpi_srat config v2 2008-07-08 15:49:08 +02:00
stacktrace.h
stat.h
statfs.h
string_32.h x86: string_32.h: workaround for broken gcc 4.0 2008-05-26 13:36:53 -07:00
string_64.h
string.h
suspend_32.h x86: more header fixes 2008-06-18 12:27:03 +02:00
suspend_64.h
suspend.h
swiotlb.h dma-mapping: add the device argument to dma_mapping_error() 2008-07-26 12:00:03 -07:00
sync_bitops.h
system_64.h
system.h x86: fix savesegment() bug causing crashes on 64-bit 2008-07-11 19:51:47 +02:00
tce.h
termbits.h
termios.h
therm_throt.h
thread_info.h clean up duplicated alloc/free_thread_info 2008-07-25 10:53:28 -07:00
time.h x86: merge tsc_init and clocksource code 2008-07-09 07:43:27 +02:00
timer.h x86: rename paravirtualized TSC functions 2008-07-09 07:43:28 +02:00
timex.h
tlb.h
tlbflush.h x86: prevent PGE flush from interruption/preemption 2008-05-23 18:16:15 +02:00
topology.h x86: change _node_to_cpumask_ptr to return const ptr 2008-07-13 19:11:58 +02:00
trampoline.h
traps.h x86: introducing asm-x86/traps.h 2008-07-18 18:51:57 +02:00
tsc.h x86: merge tsc_init and clocksource code 2008-07-09 07:43:27 +02:00
types.h x86: types: use <asm-generic/int-*.h> for the x86 architecture 2008-05-02 16:18:42 -07:00
uaccess_32.h x86: define architectural characteristics in uaccess.h. 2008-07-09 09:14:29 +02:00
uaccess_64.h x86: introduce copy_user_handle_tail() routine 2008-07-09 15:51:03 +02:00
uaccess.h x86: lockless get_user_pages_fast() 2008-07-26 12:00:06 -07:00
ucontext.h
unaligned.h kernel: Move arches to use common unaligned access 2008-04-29 08:06:27 -07:00
unistd_32.h flag parameters add-on: remove epoll_create size param 2008-07-24 10:47:29 -07:00
unistd_64.h flag parameters add-on: remove epoll_create size param 2008-07-24 10:47:29 -07:00
unistd.h fix asm-x86/{posix_types,unistd}.h 2008-04-26 17:35:46 +02:00
unwind.h
user32.h
user_32.h
user_64.h
user.h
vdso.h x86_64: further cleanup of 32-bit compat syscall mechanisms 2008-07-16 11:08:27 +02:00
vga.h
vgtod.h
vic.h
vm86.h x86: break mutual header inclusion 2008-06-02 12:48:23 +02:00
vmi_time.h x86: rename paravirtualized TSC functions 2008-07-09 07:43:28 +02:00
vmi.h
voyager.h
vsyscall.h x86: add notrace annotations to vsyscall. 2008-05-23 20:31:39 +02:00
xor_32.h x86: more header fixes 2008-06-18 12:27:03 +02:00
xor_64.h x86: more header fixes 2008-06-18 12:27:03 +02:00
xor.h