unsigned long is 32bit wide when compiling with the compiler flag "-mx32"
but the digit size of the math libraries is still 64 bit which lead to
the buggy ecc code.
Therefore define a new type ltc_mp_digit with the correct width and use
that as return value of get_digit()
Has been tested with all three math providers
Because many of the hash-functions implemented by LTC use the length
of the input when padding the input out to a block-length, LTC keeps
track of the input length in a 64-bit integer. However, it did not
previously test for overflow of this value. Since many of the
hash-functions implemented by LTC are defined for inputs of length
2^128 bits or more, this means that LTC was incorrectly implementing
these hash functions for extremely long inputs. Also, this might have
been a minor security problem: A clever attacker might have been able
to take a message with a known hash and find another message (longer
by 2^64 bits) that would be hashed to the same value by LTC.
Fortunately, LTC uses a pre-processor macro to make the actual code
for hashing, and so this problem could be fixed by adding an
overflow-check to that macro.
First of all, it had a failure in SEED:
LTC_KSEED failed for x=0, I got:
expected actual (ciphertext)
5e == 5e
ba == ba
c6 == c6
e0 == e0
05 != 00
4e != 00
16 != 00
68 != 00
19 == 19
af == af
f1 == f1
cc == cc
6d != 00
34 != 00
6c != 00
db != 00
Since SEED uses the 32H macros, this is really analogous to the
problem I saw with the 64H macros in Camellia with gcc. Not sure why
gcc only had a problem with 64H and not 32H, but since this is an
interaction with the optimizer, it's not going to happen every time
the macro is used (hence why the store tests pass; only when you get
into the complexity of a real cipher do you start having problems) and
it makes sense it will vary from compiler to compiler.
Anyway, I went ahead and added the ability to use __builtin_bswap32,
in addition to __builtin_bswap64, which I already did in a previous
commit. This solves the problem for clang, although I had to add new
logic to detect the bswap builtins in clang, since it has a different
way to detect them than gcc (see the comments in the code). The
detection logic was complicated enough, and applied to both the 32H
and 64H macros, so I factored out the detection logic into
tomcrypt_cfg.h.
This produces slightly better performance than the inline assembly,
and has the added benefit that it should be portable to other systems
that use gcc, not just x86-64.
Here are the results on my "AMD Athlon(tm) 7450 Dual-Core Processor"
with "gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3":
with portable 64H macros:
camellia : Schedule at 1659
camellia [ 23]: Encrypt at 431, Decrypt at 434
whirlpool : Process at 55
with inline assembly (with "memory clobber" for correctness):
camellia : Schedule at 1380
camellia [ 23]: Encrypt at 406, Decrypt at 403
whirlpool : Process at 50
with __builtin_bswap64:
camellia : Schedule at 1352
camellia [ 23]: Encrypt at 396, Decrypt at 391
whirlpool : Process at 46
This had been causing Camellia (the only cipher that uses these
macros) to fail when compiling "out-of-the-box" with gcc version
"4.3.3-5ubuntu4". I think because the compiler had no idea any memory
access was going on in these macros.
Adding "memory" as a clobber solves the problem, but is probably
overkill. I suspect that if we specify the constraint for y
differently, we could get rid of both "memory" and __volatile__, which
would allow the compiler to optimize much more.
Also, in gcc versions that support it, we should probably use the
bswap builtins instead.
This seemed to be the only place in the code that was using this
particular transposition. And, indeed, when compiling with
"GMP_DESC", it looks like it is necessary to disable Diffie-Hellman.
(Otherwise, the test fails for me.)