This produces slightly better performance than the inline assembly, and has the added benefit that it should be portable to other systems that use gcc, not just x86-64. Here are the results on my "AMD Athlon(tm) 7450 Dual-Core Processor" with "gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3": with portable 64H macros: camellia : Schedule at 1659 camellia [ 23]: Encrypt at 431, Decrypt at 434 whirlpool : Process at 55 with inline assembly (with "memory clobber" for correctness): camellia : Schedule at 1380 camellia [ 23]: Encrypt at 406, Decrypt at 403 whirlpool : Process at 50 with __builtin_bswap64: camellia : Schedule at 1352 camellia [ 23]: Encrypt at 396, Decrypt at 391 whirlpool : Process at 46
See doc/crypt.pdf
Languages
C
98.2%
Makefile
0.7%
Perl
0.4%
Shell
0.3%
Java
0.2%
Other
0.1%