As described in a previous patch and documented in mm/filemap.h,
copy_from_user_inatomic* shouldn't zero out the tail of the buffer after an
incomplete copy.
This patch implements that change for i386.
For the _nocache version, a new __copy_user_intel_nocache is defined similar
to copy_user_zeroio_intel_nocache, and this is ultimately used for the copy.
For the regular version, __copy_from_user_ll_nozero is defined which uses
__copy_user and __copy_user_intel - the later needs casts to reposition the
__user annotations.
If copy_from_user_atomic is given a constant length of 1, 2, or 4, then we do
still zero the destintion on failure. This didn't seem worth the effort of
fixing as the places where it is used really don't care.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the x86 cache-bypassing copy instructions for copy_from_user().
Some performance data are
Total of GLOBAL_POWER_EVENTS (CPU cycle samples)
2.6.12.4.orig 1921587
2.6.12.4.nt 1599424
1599424/1921587=83.23% (16.77% reduction)
BSQ_CACHE_REFERENCE (L3 cache miss)
2.6.12.4.orig 57427
2.6.12.4.nt 20858
20858/57427=36.32% (63.7% reduction)
L3 cache miss reduction of __copy_from_user_ll
samples %
37408 65.1412 vmlinux __copy_from_user_ll
23 0.1103 vmlinux __copy_user_zeroing_intel_nocache
23/37408=0.061% (99.94% reduction)
Top 5 of 2.6.12.4.nt
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % app name symbol name
128392 8.0274 vmlinux __copy_user_zeroing_intel_nocache
64206 4.0143 vmlinux journal_add_journal_head
59746 3.7355 vmlinux do_get_write_access
47674 2.9807 vmlinux journal_put_journal_head
46021 2.8774 vmlinux journal_dirty_metadata
pattern9-0-cpu4-0-09011728/summary.out
Counted BSQ_CACHE_REFERENCE events (cache references seen by the bus unit) with a unit mask of 0x3f (multiple flags) count 3000
samples % app name symbol name
69755 4.2861 vmlinux __copy_user_zeroing_intel_nocache
55685 3.4215 vmlinux journal_add_journal_head
52371 3.2179 vmlinux __find_get_block
45504 2.7960 vmlinux journal_put_journal_head
36005 2.2123 vmlinux journal_stop
pattern9-0-cpu4-0-09011744/summary.out
Counted BSQ_CACHE_REFERENCE events (cache references seen by the bus unit) with a unit mask of 0x200 (read 3rd level cache miss) count 3000
samples % app name symbol name
1147 5.4994 vmlinux journal_add_journal_head
881 4.2240 vmlinux journal_dirty_data
872 4.1809 vmlinux blk_rq_map_sg
734 3.5192 vmlinux journal_commit_transaction
617 2.9582 vmlinux radix_tree_delete
pattern9-0-cpu4-0-09011731/summary.out
iozone results are
original 2.6.12.4 CPU time = 207.768 sec
cache aware CPU time = 184.783 sec
(three times run)
184.783/207.768=88.94% (11.06% reduction)
original:
pattern9-0-cpu4-0-08191720/iozone.out: CPU Utilization: Wall time 45.997 CPU time 64.527 CPU utilization 140.28 %
pattern9-0-cpu4-0-08191741/iozone.out: CPU Utilization: Wall time 46.878 CPU time 71.933 CPU utilization 153.45 %
pattern9-0-cpu4-0-08191743/iozone.out: CPU Utilization: Wall time 45.152 CPU time 71.308 CPU utilization 157.93 %
cache awre:
pattern9-0-cpu4-0-09011728/iozone.out: CPU Utilization: Wall time 44.842 CPU time 62.465 CPU utilization 139.30 %
pattern9-0-cpu4-0-09011731/iozone.out: CPU Utilization: Wall time 44.718 CPU time 59.273 CPU utilization 132.55 %
pattern9-0-cpu4-0-09011744/iozone.out: CPU Utilization: Wall time 44.367 CPU time 63.045 CPU utilization 142.10 %
Signed-off-by: Hiro Yoshioka <hyoshiok@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Several implementations were essentialy a common piece of C code using
the cmpxchg() macro. Put the implementation in one spot that everyone
can share, and convert sparc64 over to using this.
Alpha is the lone arch-specific implementation, which codes up a
special fast path for the common case in order to avoid GP reloading
which a pure C version would require.
Signed-off-by: David S. Miller <davem@davemloft.net>
* EXPORT_SYMBOL's moved to other files
* #include <linux/config.h>, <linux/module.h> where needed
* #include's in i386_ksyms.c cleaned up
* After copy-paste, redundant due to Makefiles rules preprocessor directives
removed:
#ifdef CONFIG_FOO
EXPORT_SYMBOL(foo);
#endif
obj-$(CONFIG_FOO) += foo.o
* Tiny reformat to fit in 80 columns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch implements a number of smp_processor_id() cleanup ideas that
Arjan van de Ven and I came up with.
The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
spaghetti was hard to follow both on the implementational and on the
usage side.
Some of the complexity arose from picking wrong names, some of the
complexity comes from the fact that not all architectures defined
__smp_processor_id.
In the new code, there are two externally visible symbols:
- smp_processor_id(): debug variant.
- raw_smp_processor_id(): nondebug variant. Replaces all existing
uses of _smp_processor_id() and __smp_processor_id(). Defined
by every SMP architecture in include/asm-*/smp.h.
There is one new internal symbol, dependent on DEBUG_PREEMPT:
- debug_smp_processor_id(): internal debug variant, mapped to
smp_processor_id().
Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
lib/smp_processor_id.c file. All related comments got updated and/or
clarified.
I have build/boot tested the following 8 .config combinations on x86:
{SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}
I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT. (Other
architectures are untested, but should work just fine.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!