4277eedd79
Optimize integer-to-string conversion in vsprintf.c for base 10. This is by far the most used conversion, and in some use cases it impacts performance. For example, top reads /proc/$PID/stat for every process, and with 4000 processes decimal conversion alone takes noticeable time. Using code from http://www.cs.uiowa.edu/~jones/bcd/decimal.html (with permission from the author, Douglas W. Jones) binary-to-decimal-string conversion is done in groups of five digits at once, using only additions/subtractions/shifts (with -O2; -Os throws in some multiply instructions). On i386 arch gcc 4.1.2 -O2 generates ~500 bytes of code. This patch is run tested. Userspace benchmark/test is also attached. I tested it on PIII and AMD64 and new code is generally ~2.5 times faster. On AMD64: # ./vsprintf_verify-O2 Original decimal conv: .......... 151 ns per iteration Patched decimal conv: .......... 62 ns per iteration Testing correctness 12895992590592 ok... [Ctrl-C] # ./vsprintf_verify-O2 Original decimal conv: .......... 151 ns per iteration Patched decimal conv: .......... 62 ns per iteration Testing correctness 26025406464 ok... [Ctrl-C] More realistic test: top from busybox project was modified to report how many us it took to scan /proc (this does not account any processing done after that, like sorting process list), and then I test it with 4000 processes: #!/bin/sh i=4000 while test $i != 0; do sleep 30 & let i-- done busybox top -b -n3 >/dev/null on unpatched kernel: top: 4120 processes took 102864 microseconds to scan top: 4120 processes took 91757 microseconds to scan top: 4120 processes took 92517 microseconds to scan top: 4120 processes took 92581 microseconds to scan on patched kernel: top: 4120 processes took 75460 microseconds to scan top: 4120 processes took 66451 microseconds to scan top: 4120 processes took 67267 microseconds to scan top: 4120 processes took 67618 microseconds to scan The speedup comes from much faster generation of /proc/PID/stat by sprintf() calls inside the kernel. Signed-off-by: Douglas W Jones <jones@cs.uiowa.edu> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
---|---|---|
.. | ||
lzo | ||
reed_solomon | ||
zlib_deflate | ||
zlib_inflate | ||
.gitignore | ||
audit.c | ||
bitmap.c | ||
bitrev.c | ||
bug.c | ||
bust_spinlocks.c | ||
check_signature.c | ||
cmdline.c | ||
cpumask.c | ||
crc16.c | ||
crc32.c | ||
crc32defs.h | ||
crc-ccitt.c | ||
crc-itu-t.c | ||
ctype.c | ||
debug_locks.c | ||
dec_and_lock.c | ||
devres.c | ||
div64.c | ||
dump_stack.c | ||
extable.c | ||
fault-inject.c | ||
find_next_bit.c | ||
gen_crc32table.c | ||
genalloc.c | ||
halfmd4.c | ||
hexdump.c | ||
hweight.c | ||
idr.c | ||
inflate.c | ||
int_sqrt.c | ||
iomap_copy.c | ||
iomap.c | ||
ioremap.c | ||
irq_regs.c | ||
Kconfig | ||
Kconfig.debug | ||
kernel_lock.c | ||
klist.c | ||
kobject_uevent.c | ||
kobject.c | ||
kref.c | ||
libcrc32c.c | ||
list_debug.c | ||
locking-selftest-hardirq.h | ||
locking-selftest-mutex.h | ||
locking-selftest-rlock-hardirq.h | ||
locking-selftest-rlock-softirq.h | ||
locking-selftest-rlock.h | ||
locking-selftest-rsem.h | ||
locking-selftest-softirq.h | ||
locking-selftest-spin-hardirq.h | ||
locking-selftest-spin-softirq.h | ||
locking-selftest-spin.h | ||
locking-selftest-wlock-hardirq.h | ||
locking-selftest-wlock-softirq.h | ||
locking-selftest-wlock.h | ||
locking-selftest-wsem.h | ||
locking-selftest.c | ||
Makefile | ||
parser.c | ||
percpu_counter.c | ||
plist.c | ||
prio_tree.c | ||
radix-tree.c | ||
random32.c | ||
rbtree.c | ||
reciprocal_div.c | ||
rwsem-spinlock.c | ||
rwsem.c | ||
semaphore-sleepers.c | ||
sha1.c | ||
smp_processor_id.c | ||
sort.c | ||
spinlock_debug.c | ||
string.c | ||
swiotlb.c | ||
textsearch.c | ||
ts_bm.c | ||
ts_fsm.c | ||
ts_kmp.c | ||
vsprintf.c |