added libtommath-0.14

This commit is contained in:
Tom St Denis 2003-03-13 02:11:11 +00:00 committed by Steffen Jaeckel
parent b66471f74f
commit 82f4858291
94 changed files with 600 additions and 418 deletions

BIN
bn.pdf

Binary file not shown.

197
bn.tex
View File

@ -1,15 +1,15 @@
\documentclass{article} \documentclass{article}
\begin{document} \begin{document}
\title{LibTomMath v0.13 \\ A Free Multiple Precision Integer Library} \title{LibTomMath v0.14 \\ A Free Multiple Precision Integer Library \\ http://math.libtomcrypt.org }
\author{Tom St Denis \\ tomstdenis@iahu.ca} \author{Tom St Denis \\ tomstdenis@iahu.ca}
\maketitle \maketitle
\newpage \newpage
\section{Introduction} \section{Introduction}
``LibTomMath'' is a free and open source library that provides multiple-precision integer functions required to form a basis ``LibTomMath'' is a free and open source library that provides multiple-precision integer functions required to form a
of a public key cryptosystem. LibTomMath is written entire in portable ISO C source code and designed to have an application basis of a public key cryptosystem. LibTomMath is written entire in portable ISO C source code and designed to have an
interface much like that of MPI from Michael Fromberger. application interface much like that of MPI from Michael Fromberger.
LibTomMath was written from scratch by Tom St Denis but designed to be drop in replacement for the MPI package. The LibTomMath was written from scratch by Tom St Denis but designed to be drop in replacement for the MPI package. The
algorithms within the library are derived from descriptions as provided in the Handbook of Applied Cryptography and Knuth's algorithms within the library are derived from descriptions as provided in the Handbook of Applied Cryptography and Knuth's
@ -23,8 +23,7 @@ LibTomMath was designed with the following goals in mind:
\item Be written entirely in portable C. \item Be written entirely in portable C.
\end{enumerate} \end{enumerate}
All three goals have been achieved. Particularly the speed increase goal. For example, a 512-bit modular exponentiation All three goals have been achieved to one extent or another (actual figures depend on what platform you are using).
is eight times faster\footnote{On an Athlon XP with GCC 3.2} with LibTomMath compared to MPI.
Being compatible with MPI means that applications that already use it can be ported fairly quickly. Currently there are Being compatible with MPI means that applications that already use it can be ported fairly quickly. Currently there are
a few differences but there are many similarities. In fact the average MPI based application can be ported in under 15 a few differences but there are many similarities. In fact the average MPI based application can be ported in under 15
@ -54,16 +53,26 @@ make install
Now within your application include ``tommath.h'' and link against libtommath.a to get MPI-like functionality. Now within your application include ``tommath.h'' and link against libtommath.a to get MPI-like functionality.
\subsection{Microsoft Visual C++}
A makefile is also provided for MSVC (\textit{tested against MSVC 6.00 with SP5}) which allows the library to be used
with that compiler as well. To build the library type
\begin{verbatim}
nmake -f makefile.msvc
\end{verbatim}
Which will build ``tommath.lib''.
\section{Programming with LibTomMath} \section{Programming with LibTomMath}
\subsection{The mp\_int Structure} \subsection{The mp\_int Structure}
All multiple precision integers are stored in a structure called \textbf{mp\_int}. A multiple precision integer is All multiple precision integers are stored in a structure called \textbf{mp\_int}. A multiple precision integer is
essentially an array of \textbf{mp\_digit}. mp\_digit is defined at the top of bn.h. Its type can be changed to suit essentially an array of \textbf{mp\_digit}. mp\_digit is defined at the top of ``tommath.h''. The type can be changed
a particular platform. to suit a particular platform.
For example, when \textbf{MP\_8BIT} is defined\footnote{When building bn.c.} a mp\_digit is a unsigned char and holds For example, when \textbf{MP\_8BIT} is defined a mp\_digit is a unsigned char and holds seven bits. Similarly
seven bits. Similarly when \textbf{MP\_16BIT} is defined a mp\_digit is a unsigned short and holds 15 bits. when \textbf{MP\_16BIT} is defined a mp\_digit is a unsigned short and holds 15 bits. By default a mp\_digit is a
By default a mp\_digit is a unsigned long and holds 28 bits. unsigned long and holds 28 bits which is optimal for most 32 and 64 bit processors.
The choice of digit is particular to the platform at hand and what available multipliers are provided. For The choice of digit is particular to the platform at hand and what available multipliers are provided. For
MP\_8BIT either a $8 \times 8 \Rightarrow 16$ or $16 \times 16 \Rightarrow 16$ multiplier is optimal. When MP\_8BIT either a $8 \times 8 \Rightarrow 16$ or $16 \times 16 \Rightarrow 16$ multiplier is optimal. When
@ -83,20 +92,19 @@ $W$ is the number of bits in a digit (default is 28).
\subsection{Calling Functions} \subsection{Calling Functions}
Most functions expect pointers to mp\_int's as parameters. To save on memory usage it is possible to have source Most functions expect pointers to mp\_int's as parameters. To save on memory usage it is possible to have source
variables as destinations. For example: variables as destinations. The arguements are read left to right so to compute $x + y = z$ you would pass the arguments
in the order $x, y, z$. For example:
\begin{verbatim} \begin{verbatim}
mp_add(&x, &y, &x); /* x = x + y */ mp_add(&x, &y, &x); /* x = x + y */
mp_mul(&x, &z, &x); /* x = x * z */ mp_mul(&y, &x, &z); /* z = y * x */
mp_div_2(&x, &x); /* x = x / 2 */ mp_div_2(&x, &y); /* y = x / 2 */
\end{verbatim} \end{verbatim}
\section{Quick Overview} \subsection{Return Values}
All functions that return errors will return \textbf{MP\_OKAY} if the function was succesful. It will return
\textbf{MP\_MEM} if it ran out of heap memory or \textbf{MP\_VAL} if one of the arguements is out of range.
\subsection{Basic Functionality} \subsection{Basic Functionality}
Essentially all LibTomMath functions return one of three values to indicate if the function worked as desired. A
function will return \textbf{MP\_OKAY} if the function was successful. A function will return \textbf{MP\_MEM} if
it ran out of memory and \textbf{MP\_VAL} if the input was invalid.
Before an mp\_int can be used it must be initialized with Before an mp\_int can be used it must be initialized with
\begin{verbatim} \begin{verbatim}
@ -106,7 +114,7 @@ int mp_init(mp_int *a);
For example, consider the following. For example, consider the following.
\begin{verbatim} \begin{verbatim}
#include "bn.h" #include "tommath.h"
int main(void) int main(void)
{ {
mp_int num; mp_int num;
@ -383,6 +391,18 @@ in $c$ and returns success.
This function requires $O(N)$ additional digits of memory and $O(2 \cdot N)$ time. This function requires $O(N)$ additional digits of memory and $O(2 \cdot N)$ time.
\subsubsection{mp\_mul\_2(mp\_int *a, mp\_int *b)}
Multiplies $a$ by two and stores in $b$. This function is hard coded todo a shift by one place so it is faster
than calling mp\_mul\_2d with a count of one.
This function requires $O(N)$ additional digits of memory and $O(N)$ time.
\subsubsection{mp\_div\_2(mp\_int *a, mp\_int *b)}
Divides $a$ by two and stores in $b$. This function is hard coded todo a shift by one place so it is faster
than calling mp\_div\_2d with a count of one.
This function requires $O(N)$ additional digits of memory and $O(N)$ time.
\subsubsection{mp\_mod\_2d(mp\_int *a, int b, mp\_int *c)} \subsubsection{mp\_mod\_2d(mp\_int *a, int b, mp\_int *c)}
Performs the action of reducing $a$ modulo $2^b$ and stores the result in $c$. If the shift count $b$ is less than Performs the action of reducing $a$ modulo $2^b$ and stores the result in $c$. If the shift count $b$ is less than
or equal to zero the function places $a$ in $c$ and returns success. or equal to zero the function places $a$ in $c$ and returns success.
@ -412,7 +432,7 @@ of $c$ is the maximum length of the two inputs.
\subsection{Basic Arithmetic} \subsection{Basic Arithmetic}
\subsubsection{mp\_cmp(mp\_int *a, mp\_int *b)} \subsubsection{mp\_cmp(mp\_int *a, mp\_int *b)}
Performs a \textbf{signed} comparison between $a$ and $b$ returning \textbf{MP\_GT} is $a$ is larger than $b$. Performs a \textbf{signed} comparison between $a$ and $b$ returning \textbf{MP\_GT} if $a$ is larger than $b$.
This function requires no additional memory and $O(N)$ time. This function requires no additional memory and $O(N)$ time.
@ -559,57 +579,6 @@ A very useful observation is that multiplying by $R = \beta^n$ amounts to perfor
requires no single precision multiplications. requires no single precision multiplications.
\section{Timing Analysis} \section{Timing Analysis}
\subsection{Observed Timings}
A simple test program ``demo.c'' was developed which builds with either MPI or LibTomMath (without modification). The
test was conducted on an AMD Athlon XP processor with 266Mhz DDR memory and the GCC 3.2 compiler\footnote{With build
options ``-O3 -fomit-frame-pointer -funroll-loops''}. The multiplications and squarings were repeated 100,000 times
each while the modular exponentiation (exptmod) were performed 50 times each. The ``inversions'' refers to multiplicative
inversions modulo an odd number of a given size. The RDTSC (Read Time Stamp Counter) instruction was used to measure the
time the entire iterations took and was divided by the number of iterations to get an average. The following results
were observed.
\begin{small}
\begin{center}
\begin{tabular}{c|c|c|c}
\hline \textbf{Operation} & \textbf{Size (bits)} & \textbf{Time with MPI (cycles)} & \textbf{Time with LibTomMath (cycles)} \\
\hline
Inversion & 128 & 264,083 & 59,782 \\
Inversion & 256 & 549,370 & 146,915 \\
Inversion & 512 & 1,675,975 & 367,172 \\
Inversion & 1024 & 5,237,957 & 1,054,158 \\
Inversion & 2048 & 17,871,944 & 3,459,683 \\
Inversion & 4096 & 66,610,468 & 11,834,556 \\
\hline
Multiply & 128 & 1,426 & 451 \\
Multiply & 256 & 2,551 & 958 \\
Multiply & 512 & 7,913 & 2,476 \\
Multiply & 1024 & 28,496 & 7,927 \\
Multiply & 2048 & 109,897 & 28,224 \\
Multiply & 4096 & 469,970 & 101,171 \\
\hline
Square & 128 & 1,319 & 511 \\
Square & 256 & 1,776 & 947 \\
Square & 512 & 5,399 & 2,153 \\
Square & 1024 & 18,991 & 5,733 \\
Square & 2048 & 72,126 & 17,621 \\
Square & 4096 & 306,269 & 67,576 \\
\hline
Exptmod & 512 & 32,021,586 & 3,118,435 \\
Exptmod & 768 & 97,595,492 & 8,493,633 \\
Exptmod & 1024 & 223,302,532 & 17,715,899 \\
Exptmod & 2048 & 1,682,223,369 & 114,936,361 \\
Exptmod & 2560 & 3,268,615,571 & 229,402,426 \\
Exptmod & 3072 & 5,597,240,141 & 367,403,840 \\
Exptmod & 4096 & 13,347,270,891 & 779,058,433
\end{tabular}
\end{center}
\end{small}
Note that the figures do fluctuate but their magnitudes are relatively intact. The purpose of the chart is not to
get an exact timing but to compare the two libraries. For example, in all of the tests the exact time for a 512-bit
squaring operation was not the same. The observed times were all approximately 2,500 cycles, more importantly they
were always faster than the timings observed with MPI by about the same magnitude.
\subsection{Digit Size} \subsection{Digit Size}
The first major constribution to the time savings is the fact that 28 bits are stored per digit instead of the MPI The first major constribution to the time savings is the fact that 28 bits are stored per digit instead of the MPI
@ -619,29 +588,59 @@ A savings of $64^2 - 37^2 = 2727$ single precision multiplications.
\subsection{Multiplication Algorithms} \subsection{Multiplication Algorithms}
For most inputs a typical baseline $O(n^2)$ multiplier is used which is similar to that of MPI. There are two variants For most inputs a typical baseline $O(n^2)$ multiplier is used which is similar to that of MPI. There are two variants
of the baseline multiplier. The normal and the fast variants. The normal baseline multiplier is the exact same as the of the baseline multiplier. The normal and the fast comba variant. The normal baseline multiplier is the exact same as
algorithm from MPI. The fast baseline multiplier is optimized for cases where the number of input digits $N$ is less the algorithm from MPI. The fast comba baseline multiplier is optimized for cases where the number of input digits $N$
than or equal to $2^{w}/\beta^2$. Where $w$ is the number of bits in a \textbf{mp\_word}. By default a mp\_word is is less than or equal to $2^{w}/\beta^2$. Where $w$ is the number of bits in a \textbf{mp\_word} or simply $lg(\beta)$.
64-bits which means $N \le 256$ is allowed which represents numbers upto $7168$ bits. By default a mp\_word is 64-bits which means $N \le 256$ is allowed which represents numbers upto $7,168$ bits. However,
since the Karatsuba multiplier (discussed below) will kick in before that size the slower baseline algorithm (that MPI
uses) should never really be used in a default configuration.
The fast baseline multiplier is optimized by removing the carry operations from the inner loop. This is often referred The fast comba baseline multiplier is optimized by removing the carry operations from the inner loop. This is often
to as the ``comba'' method since it computes the products a columns first then figures out the carries. This has the referred to as the ``comba'' method since it computes the products a columns first then figures out the carries. To
effect of making a very simple and paralizable inner loop. accomodate this the result of the inner multiplications must be stored in words large enough not to lose the carry bits.
This is why there is a limit of $2^{w}/\beta^2$ digits in the input. This optimization has the effect of making a
very simple and efficient inner loop.
For large inputs, typically 80 digits\footnote{By default that is 2240-bits or more.} or more the Karatsuba method is \subsubsection{Karatsuba Multiplier}
used. This method has significant overhead but an asymptotic running time of $O(n^{1.584})$ which means for fairly large For large inputs, typically 80 digits\footnote{By default that is 2240-bits or more.} or more the Karatsuba multiplication
inputs this method is faster. The Karatsuba implementation is recursive which means for extremely large inputs they method is used. This method has significant overhead but an asymptotic running time of $O(n^{1.584})$ which means for
will benefit from the algorithm. fairly large inputs this method is faster than the baseline (or comba) algorithm. The Karatsuba implementation is
recursive which means for extremely large inputs they will benefit from the algorithm.
The algorithm is based on the observation that if
\begin{eqnarray}
x = x_0 + x_1\beta \nonumber \\
y = y_0 + y_1\beta
\end{eqnarray}
Where $x_0, x_1, y_0, y_1$ are half the size of their respective summand than
\begin{equation}
x \cdot y = x_1y_1\beta^2 + ((x_1 - y_1)(x_0 - y_0) + x_0y_0 + x_1y_1)\beta + x_0y_0
\end{equation}
It is trivial that from this only three products have to be produced: $x_0y_0, x_1y_1, (x_1-y_1)(x_0-y_0)$ which
are all of half size numbers. A multiplication of two half size numbers requires only $1 \over 4$ of the
original work which means with no recursion the Karatsuba algorithm achieves a running time of ${3n^2}\over 4$.
The routine provided does recursion which is where the $O(n^{1.584})$ work factor comes from.
The multiplication by $\beta$ and $\beta^2$ amount to digit shift operations.
The extra overhead in the Karatsuba method comes from extracting the half size numbers $x_0, x_1, y_0, y_1$ and
performing the various smaller calculations.
The library has been fairly optimized to extract the digits using hard-coded routines instead of the hire
level functions however there is still significant overhead to optimize away.
MPI only implements the slower baseline multiplier where carries are dealt with in the inner loop. As a result even at MPI only implements the slower baseline multiplier where carries are dealt with in the inner loop. As a result even at
smaller numbers (below the Karatsuba cutoff) the LibTomMath multipliers are faster. smaller numbers (below the Karatsuba cutoff) the LibTomMath multipliers are faster.
\subsection{Squaring Algorithms} \subsection{Squaring Algorithms}
Similar to the multiplication algorithms there are two baseline squaring algorithms. Both have an asymptotic running Similar to the multiplication algorithms there are two baseline squaring algorithms. Both have an asymptotic
time of $O((t^2 + t)/2)$. The normal baseline squaring is the same from MPI and the fast is a ``comba'' squaring running time of $O((t^2 + t)/2)$. The normal baseline squaring is the same from MPI and the fast method is
algorithm. The comba method is used if the number of digits $N$ is less than $2^{w-1}/\beta^2$ which by default a ``comba'' squaring algorithm. The comba method is used if the number of digits $N$ is less than
covers numbers upto $3584$ bits. $2^{w-1}/\beta^2$ which by default covers numbers upto $3,584$ bits.
There is also a Karatsuba squaring method which achieves a running time of $O(n^{1.584})$ after considerably large There is also a Karatsuba squaring method which achieves a running time of $O(n^{1.584})$ after considerably large
inputs. inputs.
@ -653,25 +652,31 @@ than MPI is.
LibTomMath implements a sliding window $k$-ary left to right exponentiation algorithm. For a given exponent size $L$ an LibTomMath implements a sliding window $k$-ary left to right exponentiation algorithm. For a given exponent size $L$ an
appropriate window size $k$ is chosen. There are always at most $L$ modular squarings and $\lfloor L/k \rfloor$ modular appropriate window size $k$ is chosen. There are always at most $L$ modular squarings and $\lfloor L/k \rfloor$ modular
multiplications. The $k$-ary method works by precomputing values $g(x) = b^x$ for $0 \le x < 2^k$ and a given base multiplications. The $k$-ary method works by precomputing values $g(x) = b^x$ for $2^{k-1} \le x < 2^k$ and a given base
$b$. Then the multiplications are grouped in windows of $k$ bits. The sliding window technique has the benefit $b$. Then the multiplications are grouped in windows of $k$ bits. The sliding window technique has the benefit
that it can skip multiplications if there are zero bits following or preceding a window. Consider the exponent that it can skip multiplications if there are zero bits following or preceding a window. Consider the exponent
$e = 11110001_2$ if $k = 2$ then there will be a two squarings, a multiplication of $g(3)$, two squarings, a multiplication $e = 11110001_2$ if $k = 2$ then there will be a two squarings, a multiplication of $g(3)$, two squarings, a multiplication
of $g(3)$, four squarings and and a multiplication by $g(1)$. In total there are 8 squarings and 3 multiplications. of $g(3)$, four squarings and and a multiplication by $g(1)$. In total there are 8 squarings and 3 multiplications.
MPI uses a binary square-multiply method. For the same exponent $e$ it would have had 8 squarings and 5 multiplications. MPI uses a binary square-multiply method for exponentiation. For the same exponent $e = 11110001_2$ it would have had to
There is a precomputation phase for the method LibTomMath uses but it generally cuts down considerably on the number perform 8 squarings and 5 multiplications. There is a precomputation phase for the method LibTomMath uses but it
of multiplications. Consider a 512-bit exponent. The worst case for the LibTomMath method results in 512 squarings and generally cuts down considerably on the number of multiplications. Consider a 512-bit exponent. The worst case for the
124 multiplications. The MPI method would have 512 squarings and 512 multiplications. Randomly every $2k$ bits another LibTomMath method results in 512 squarings and 124 multiplications. The MPI method would have 512 squarings
multiplication is saved via the sliding-window technique on top of the savings the $k$-ary method provides. and 512 multiplications. Randomly every $2k$ bits another multiplication is saved via the sliding-window
technique on top of the savings the $k$-ary method provides.
Both LibTomMath and MPI use Barrett reduction instead of division to reduce the numbers modulo the modulus given. Both LibTomMath and MPI use Barrett reduction instead of division to reduce the numbers modulo the modulus given.
However, LibTomMath can take advantage of the fact that the multiplications required within the Barrett reduction However, LibTomMath can take advantage of the fact that the multiplications required within the Barrett reduction
do not have to give full precision. As a result the reduction step is much faster and just as accurate. The LibTomMath code do not have to give full precision. As a result the reduction step is much faster and just as accurate. The LibTomMath
will automatically determine at run-time (e.g. when its called) whether the faster multiplier can be used. The code will automatically determine at run-time (e.g. when its called) whether the faster multiplier can be used. The
faster multipliers have also been optimized into the two variants (baseline and comba baseline). faster multipliers have also been optimized into the two variants (baseline and comba baseline).
LibTomMath also has a variant of the exptmod function that uses Montgomery reductions instead of Barrett reductions LibTomMath also has a variant of the exptmod function that uses Montgomery reductions instead of Barrett reductions
which is faser. As a result of all these changes exponentiation in LibTomMath is much faster than compared to MPI. which is faster. The code will automatically detect when the Montgomery version can be used (\textit{Requires the
modulus to be odd and below the MONTGOMERY\_EXPT\_CUTOFF size}). The Montgomery routine is essentially a copy of the
Barrett exponentiation routine except it uses Montgomery reduction.
As a result of all these changes exponentiation in LibTomMath is much faster than compared to MPI. On most ALU-strong
processors (AMD Athlon for instance) exponentiation in LibTomMath is often more then ten times faster than MPI.
\end{document} \end{document}

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>
@ -100,14 +100,18 @@ fast_mp_montgomery_reduce (mp_int * a, mp_int * m, mp_digit mp)
W[ix + 1] += W[ix] >> ((mp_word) DIGIT_BIT); W[ix + 1] += W[ix] >> ((mp_word) DIGIT_BIT);
} }
/* nox fix rest of carries */
for (++ix; ix <= m->used * 2 + 1; ix++) {
W[ix] += (W[ix - 1] >> ((mp_word) DIGIT_BIT));
}
{ {
register mp_digit *tmpa; register mp_digit *tmpa;
register mp_word *_W; register mp_word *_W, *_W1;
/* nox fix rest of carries */
_W1 = W + ix;
_W = W + ++ix;
for (; ix <= m->used * 2 + 1; ix++) {
*_W++ += *_W1++ >> ((mp_word) DIGIT_BIT);
}
/* copy out, A = A/b^n /* copy out, A = A/b^n
* *

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>
@ -46,6 +46,7 @@ mp_div_2 (mp_int * a, mp_int * b)
*tmpb++ = 0; *tmpb++ = 0;
} }
} }
b->sign = a->sign;
mp_clamp (b); mp_clamp (b);
return MP_OKAY; return MP_OKAY;
} }

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>
@ -51,7 +51,9 @@ mp_div_2d (mp_int * a, int b, mp_int * c, mp_int * d)
} }
/* shift by as many digits in the bit count */ /* shift by as many digits in the bit count */
if (b >= DIGIT_BIT) {
mp_rshd (c, b / DIGIT_BIT); mp_rshd (c, b / DIGIT_BIT);
}
/* shift any bit count < DIGIT_BIT */ /* shift any bit count < DIGIT_BIT */
D = (mp_digit) (b % DIGIT_BIT); D = (mp_digit) (b % DIGIT_BIT);

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>
@ -37,8 +37,7 @@ int
mp_karatsuba_mul (mp_int * a, mp_int * b, mp_int * c) mp_karatsuba_mul (mp_int * a, mp_int * b, mp_int * c)
{ {
mp_int x0, x1, y0, y1, t1, t2, x0y0, x1y1; mp_int x0, x1, y0, y1, t1, t2, x0y0, x1y1;
int B, err, x; int B, err;
err = MP_MEM; err = MP_MEM;
@ -59,13 +58,13 @@ mp_karatsuba_mul (mp_int * a, mp_int * b, mp_int * c)
goto Y0; goto Y0;
/* init temps */ /* init temps */
if (mp_init (&t1) != MP_OKAY) if (mp_init_size (&t1, B * 2) != MP_OKAY)
goto Y1; goto Y1;
if (mp_init (&t2) != MP_OKAY) if (mp_init_size (&t2, B * 2) != MP_OKAY)
goto T1; goto T1;
if (mp_init (&x0y0) != MP_OKAY) if (mp_init_size (&x0y0, B * 2) != MP_OKAY)
goto T2; goto T2;
if (mp_init (&x1y1) != MP_OKAY) if (mp_init_size (&x1y1, B * 2) != MP_OKAY)
goto X0Y0; goto X0Y0;
/* now shift the digits */ /* now shift the digits */
@ -76,18 +75,32 @@ mp_karatsuba_mul (mp_int * a, mp_int * b, mp_int * c)
x1.used = a->used - B; x1.used = a->used - B;
y1.used = b->used - B; y1.used = b->used - B;
{
register int x;
register mp_digit *tmpa, *tmpb, *tmpx, *tmpy;
/* we copy the digits directly instead of using higher level functions /* we copy the digits directly instead of using higher level functions
* since we also need to shift the digits * since we also need to shift the digits
*/ */
tmpa = a->dp;
tmpb = b->dp;
tmpx = x0.dp;
tmpy = y0.dp;
for (x = 0; x < B; x++) { for (x = 0; x < B; x++) {
x0.dp[x] = a->dp[x]; *tmpx++ = *tmpa++;
y0.dp[x] = b->dp[x]; *tmpy++ = *tmpb++;
} }
tmpx = x1.dp;
for (x = B; x < a->used; x++) { for (x = B; x < a->used; x++) {
x1.dp[x - B] = a->dp[x]; *tmpx++ = *tmpa++;
} }
tmpy = y1.dp;
for (x = B; x < b->used; x++) { for (x = B; x < b->used; x++) {
y1.dp[x - B] = b->dp[x]; *tmpy++ = *tmpb++;
}
} }
/* only need to clamp the lower words since by definition the upper words x1/y1 must /* only need to clamp the lower words since by definition the upper words x1/y1 must

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>
@ -23,8 +23,7 @@ int
mp_karatsuba_sqr (mp_int * a, mp_int * b) mp_karatsuba_sqr (mp_int * a, mp_int * b)
{ {
mp_int x0, x1, t1, t2, x0x0, x1x1; mp_int x0, x1, t1, t2, x0x0, x1x1;
int B, err, x; int B, err;
err = MP_MEM; err = MP_MEM;
@ -41,22 +40,31 @@ mp_karatsuba_sqr (mp_int * a, mp_int * b)
goto X0; goto X0;
/* init temps */ /* init temps */
if (mp_init (&t1) != MP_OKAY) if (mp_init_size (&t1, a->used * 2) != MP_OKAY)
goto X1; goto X1;
if (mp_init (&t2) != MP_OKAY) if (mp_init_size (&t2, a->used * 2) != MP_OKAY)
goto T1; goto T1;
if (mp_init (&x0x0) != MP_OKAY) if (mp_init_size (&x0x0, B * 2) != MP_OKAY)
goto T2; goto T2;
if (mp_init (&x1x1) != MP_OKAY) if (mp_init_size (&x1x1, (a->used - B) * 2) != MP_OKAY)
goto X0X0; goto X0X0;
{
register int x;
register mp_digit *dst, *src;
src = a->dp;
/* now shift the digits */ /* now shift the digits */
dst = x0.dp;
for (x = 0; x < B; x++) { for (x = 0; x < B; x++) {
x0.dp[x] = a->dp[x]; *dst++ = *src++;
} }
dst = x1.dp;
for (x = B; x < a->used; x++) { for (x = B; x < a->used; x++) {
x1.dp[x - B] = a->dp[x]; *dst++ = *src++;
}
} }
x0.used = B; x0.used = B;
@ -77,7 +85,7 @@ mp_karatsuba_sqr (mp_int * a, mp_int * b)
goto X1X1; /* t1 = (x1 - x0) * (y1 - y0) */ goto X1X1; /* t1 = (x1 - x0) * (y1 - y0) */
/* add x0y0 */ /* add x0y0 */
if (mp_add (&x0x0, &x1x1, &t2) != MP_OKAY) if (s_mp_add (&x0x0, &x1x1, &t2) != MP_OKAY)
goto X1X1; /* t2 = x0y0 + x1y1 */ goto X1X1; /* t2 = x0y0 + x1y1 */
if (mp_sub (&t2, &t1, &t1) != MP_OKAY) if (mp_sub (&t2, &t1, &t1) != MP_OKAY)
goto X1X1; /* t1 = x0y0 + x1y1 - (x1-x0)*(y1-y0) */ goto X1X1; /* t1 = x0y0 + x1y1 - (x1-x0)*(y1-y0) */

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>
@ -31,16 +31,31 @@ mp_lshd (mp_int * a, int b)
return res; return res;
} }
{
register mp_digit *tmpa, *tmpaa;
/* increment the used by the shift amount than copy upwards */ /* increment the used by the shift amount than copy upwards */
a->used += b; a->used += b;
/* top */
tmpa = a->dp + a->used - 1;
/* base */
tmpaa = a->dp + a->used - 1 - b;
/* much like mp_rshd this is implemented using a sliding window
* except the window goes the otherway around. Copying from
* the bottom to the top. see bn_mp_rshd.c for more info.
*/
for (x = a->used - 1; x >= b; x--) { for (x = a->used - 1; x >= b; x--) {
a->dp[x] = a->dp[x - b]; *tmpa-- = *tmpaa--;
} }
/* zero the lower digits */ /* zero the lower digits */
tmpa = a->dp;
for (x = 0; x < b; x++) { for (x = 0; x < b; x++) {
a->dp[x] = 0; *tmpa++ = 0;
}
} }
mp_clamp (a);
return MP_OKAY; return MP_OKAY;
} }

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>
@ -18,36 +18,29 @@
int int
mp_montgomery_setup (mp_int * a, mp_digit * mp) mp_montgomery_setup (mp_int * a, mp_digit * mp)
{ {
mp_int t, tt; unsigned long x, b;
int res;
if ((res = mp_init (&t)) != MP_OKAY) { /* fast inversion mod 2^32
return res; *
* Based on the fact that
*
* XA = 1 (mod 2^n) => (X(2-XA)) A = 1 (mod 2^2n)
* => 2*X*A - X*X*A*A = 1
* => 2*(1) - (1) = 1
*/
b = a->dp[0];
if ((b & 1) == 0) {
return MP_VAL;
} }
if ((res = mp_init (&tt)) != MP_OKAY) { x = (((b + 2) & 4) << 1) + b; /* here x*a==1 mod 2^4 */
goto __T; x *= 2 - b * x; /* here x*a==1 mod 2^8 */
} x *= 2 - b * x; /* here x*a==1 mod 2^16; each step doubles the nb of bits */
x *= 2 - b * x; /* here x*a==1 mod 2^32 */
/* tt = b */
tt.dp[0] = 0;
tt.dp[1] = 1;
tt.used = 2;
/* t = m mod b */
t.dp[0] = a->dp[0];
t.used = 1;
/* t = 1/m mod b */
if ((res = mp_invmod (&t, &tt, &t)) != MP_OKAY) {
goto __TT;
}
/* t = -1/m mod b */ /* t = -1/m mod b */
*mp = ((mp_digit) 1 << ((mp_digit) DIGIT_BIT)) - t.dp[0]; *mp = ((mp_digit) 1 << ((mp_digit) DIGIT_BIT)) - (x & MP_MASK);
res = MP_OKAY; return MP_OKAY;
__TT:mp_clear (&tt);
__T:mp_clear (&t);
return res;
} }

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>
@ -50,6 +50,11 @@ mp_mul_2 (mp_int * a, mp_int * b)
if ((res = mp_grow (b, b->used + 1)) != MP_OKAY) { if ((res = mp_grow (b, b->used + 1)) != MP_OKAY) {
return res; return res;
} }
/* after the grow *tmpb is no longer valid so we have to reset it!
* (this bug took me about 17 minutes to find...!)
*/
tmpb = b->dp + b->used;
} }
/* add a MSB of 1 */ /* add a MSB of 1 */
*tmpb = 1; *tmpb = 1;
@ -61,5 +66,6 @@ mp_mul_2 (mp_int * a, mp_int * b)
*tmpb++ = 0; *tmpb++ = 0;
} }
} }
b->sign = a->sign;
return MP_OKAY; return MP_OKAY;
} }

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>
@ -32,9 +32,11 @@ mp_mul_2d (mp_int * a, int b, mp_int * c)
} }
/* shift by as many digits in the bit count */ /* shift by as many digits in the bit count */
if (b >= DIGIT_BIT) {
if ((res = mp_lshd (c, b / DIGIT_BIT)) != MP_OKAY) { if ((res = mp_lshd (c, b / DIGIT_BIT)) != MP_OKAY) {
return res; return res;
} }
}
c->used = c->alloc; c->used = c->alloc;
/* shift any bit count < DIGIT_BIT */ /* shift any bit count < DIGIT_BIT */

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>
@ -20,7 +20,6 @@ mp_rshd (mp_int * a, int b)
{ {
int x; int x;
/* if b <= 0 then ignore it */ /* if b <= 0 then ignore it */
if (b <= 0) { if (b <= 0) {
return; return;
@ -32,14 +31,34 @@ mp_rshd (mp_int * a, int b)
return; return;
} }
{
register mp_digit *tmpa, *tmpaa;
/* shift the digits down */ /* shift the digits down */
/* base */
tmpa = a->dp;
/* offset into digits */
tmpaa = a->dp + b;
/* this is implemented as a sliding window where the window is b-digits long
* and digits from the top of the window are copied to the bottom
*
* e.g.
b-2 | b-1 | b0 | b1 | b2 | ... | bb | ---->
/\ | ---->
\-------------------/ ---->
*/
for (x = 0; x < (a->used - b); x++) { for (x = 0; x < (a->used - b); x++) {
a->dp[x] = a->dp[x + b]; *tmpa++ = *tmpaa++;
} }
/* zero the top digits */ /* zero the top digits */
for (; x < a->used; x++) { for (; x < a->used; x++) {
a->dp[x] = 0; *tmpa++ = 0;
}
} }
mp_clamp (a); mp_clamp (a);
} }

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>
@ -55,8 +55,14 @@ s_mp_add (mp_int * a, mp_int * b, mp_int * c)
register int i; register int i;
/* alias for digit pointers */ /* alias for digit pointers */
/* first input */
tmpa = a->dp; tmpa = a->dp;
/* second input */
tmpb = b->dp; tmpb = b->dp;
/* destination */
tmpc = c->dp; tmpc = c->dp;
u = 0; u = 0;

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>

View File

@ -10,10 +10,13 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#include <tommath.h> #include <tommath.h>
int KARATSUBA_MUL_CUTOFF = 80, /* Min. number of digits before Karatsuba multiplication is used. */ /* configured for a AMD Duron Morgan core with etc/tune.c */
KARATSUBA_SQR_CUTOFF = 80, /* Min. number of digits before Karatsuba squaring is used. */ int KARATSUBA_MUL_CUTOFF = 73, /* Min. number of digits before Karatsuba multiplication is used. */
MONTGOMERY_EXPT_CUTOFF = 74; /* max. number of digits that montgomery reductions will help for */ KARATSUBA_SQR_CUTOFF = 121, /* Min. number of digits before Karatsuba squaring is used. */
MONTGOMERY_EXPT_CUTOFF = 128; /* max. number of digits that montgomery reductions will help for */

View File

@ -1,3 +1,16 @@
Mar 15th, 2003
v0.14 -- Tons of manual updates
-- cleaned up the directory
-- added MSVC makefiles
-- source changes [that I don't recall]
-- Fixed up the lshd/rshd code to use pointer aliasing
-- Fixed up the mul_2d and div_2d to not call rshd/lshd unless needed
-- Fixed up etc/tune.c a tad
-- fixed up demo/demo.c to output comma-delimited results of timing
also fixed up timing demo to use a finer granularity for various functions
-- fixed up demo/demo.c testing to pause during testing so my Duron won't catch on fire
[stays around 31-35C during testing :-)]
Feb 13th, 2003 Feb 13th, 2003
v0.13 -- tons of minor speed-ups in low level add, sub, mul_2 and div_2 which propagate v0.13 -- tons of minor speed-ups in low level add, sub, mul_2 and div_2 which propagate
to other functions like mp_invmod, mp_div, etc... to other functions like mp_invmod, mp_div, etc...

View File

@ -69,18 +69,32 @@ int mp_reduce_setup(mp_int *a, mp_int *b)
} }
return mp_div(a, b, a, NULL); return mp_div(a, b, a, NULL);
} }
int mp_rand(mp_int *a, int c)
{
long z = abs(rand()) & 65535;
mp_set(a, z?z:1);
while (c--) {
s_mp_lshd(a, 1);
mp_add_d(a, abs(rand()), a);
}
return MP_OKAY;
}
#endif #endif
char cmd[4096], buf[4096]; char cmd[4096], buf[4096];
int main(void) int main(void)
{ {
mp_int a, b, c, d, e, f; mp_int a, b, c, d, e, f;
unsigned long expt_n, add_n, sub_n, mul_n, div_n, sqr_n, mul2d_n, div2d_n, gcd_n, lcm_n, inv_n; unsigned long expt_n, add_n, sub_n, mul_n, div_n, sqr_n, mul2d_n, div2d_n, gcd_n, lcm_n, inv_n,
div2_n, mul2_n;
unsigned rr; unsigned rr;
int cnt;
#ifdef TIMER #ifdef TIMER
int n; int n;
ulong64 tt; ulong64 tt;
FILE *log;
#endif #endif
mp_init(&a); mp_init(&a);
@ -90,60 +104,66 @@ int main(void)
mp_init(&e); mp_init(&e);
mp_init(&f); mp_init(&f);
#ifdef TIMER #ifdef TIMER
goto multtime;
printf("CLOCKS_PER_SEC == %lu\n", CLOCKS_PER_SEC); printf("CLOCKS_PER_SEC == %lu\n", CLOCKS_PER_SEC);
mp_read_radix(&a, "340282366920938463463374607431768211455", 10); goto expttime;
mp_read_radix(&b, "340282366920938463463574607431768211455", 10);
while (a.used * DIGIT_BIT < 8192) { log = fopen("add.log", "w");
for (cnt = 4; cnt <= 128; cnt += 4) {
mp_rand(&a, cnt);
mp_rand(&b, cnt);
reset(); reset();
for (rr = 0; rr < 10000000; rr++) { for (rr = 0; rr < 10000000; rr++) {
mp_add(&a, &b, &c); mp_add(&a, &b, &c);
} }
tt = rdtsc(); tt = rdtsc();
printf("Adding\t\t%4d-bit => %9llu/sec, %9llu ticks\n", mp_count_bits(&a), (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt, tt); printf("Adding\t\t%4d-bit => %9llu/sec, %9llu ticks\n", mp_count_bits(&a), (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt, tt);
mp_sqr(&a, &a); fprintf(log, "%d,%9llu\n", cnt, (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt);
mp_sqr(&b, &b);
} }
fclose(log);
mp_read_radix(&a, "340282366920938463463374607431768211455", 10); log = fopen("sub.log", "w");
mp_read_radix(&b, "340282366920938463463574607431768211455", 10); for (cnt = 4; cnt <= 128; cnt += 4) {
while (a.used * DIGIT_BIT < 8192) { mp_rand(&a, cnt);
mp_rand(&b, cnt);
reset(); reset();
for (rr = 0; rr < 10000000; rr++) { for (rr = 0; rr < 10000000; rr++) {
mp_sub(&a, &b, &c); mp_sub(&a, &b, &c);
} }
tt = rdtsc(); tt = rdtsc();
printf("Subtracting\t%4d-bit => %9llu/sec, %9llu ticks\n", mp_count_bits(&a), (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt, tt); printf("Subtracting\t\t%4d-bit => %9llu/sec, %9llu ticks\n", mp_count_bits(&a), (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt, tt);
mp_sqr(&a, &a); fprintf(log, "%d,%9llu\n", cnt, (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt);
mp_sqr(&b, &b);
} }
fclose(log);
multtime: multtime:
mp_read_radix(&a, "340282366920938463463374607431768211455", 10); log = fopen("sqr.log", "w");
while (a.used * DIGIT_BIT < 8192) { for (cnt = 4; cnt <= 128; cnt += 4) {
mp_rand(&a, cnt);
reset(); reset();
for (rr = 0; rr < 250000; rr++) { for (rr = 0; rr < 250000; rr++) {
mp_sqr(&a, &b); mp_sqr(&a, &b);
} }
tt = rdtsc(); tt = rdtsc();
printf("Squaring\t%4d-bit => %9llu/sec, %9llu ticks\n", mp_count_bits(&a), (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt, tt); printf("Squaring\t%4d-bit => %9llu/sec, %9llu ticks\n", mp_count_bits(&a), (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt, tt);
mp_copy(&b, &a); fprintf(log, "%d,%9llu\n", cnt, (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt);
} }
fclose(log);
mp_read_radix(&a, "340282366920938463463374607431768211455", 10); log = fopen("mult.log", "w");
while (a.used * DIGIT_BIT < 8192) { for (cnt = 4; cnt <= 128; cnt += 4) {
mp_rand(&a, cnt);
mp_rand(&b, cnt);
reset(); reset();
for (rr = 0; rr < 250000; rr++) { for (rr = 0; rr < 250000; rr++) {
mp_mul(&a, &a, &b); mp_mul(&a, &b, &c);
} }
tt = rdtsc(); tt = rdtsc();
printf("Multiplying\t%4d-bit => %9llu/sec, %9llu ticks\n", mp_count_bits(&a), (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt, tt); printf("Multiplying\t%4d-bit => %9llu/sec, %9llu ticks\n", mp_count_bits(&a), (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt, tt);
mp_copy(&b, &a); fprintf(log, "%d,%9llu\n", cnt, (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt);
} }
fclose(log);
expttime: expttime:
{ {
@ -157,6 +177,7 @@ expttime:
"1214855636816562637502584060163403830270705000634713483015101384881871978446801224798536155406895823305035467591632531067547890948695117172076954220727075688048751022421198712032848890056357845974246560748347918630050853933697792254955890439720297560693579400297062396904306270145886830719309296352765295712183040773146419022875165382778007040109957609739589875590885701126197906063620133954893216612678838507540777138437797705602453719559017633986486649523611975865005712371194067612263330335590526176087004421363598470302731349138773205901447704682181517904064735636518462452242791676541725292378925568296858010151852326316777511935037531017413910506921922450666933202278489024521263798482237150056835746454842662048692127173834433089016107854491097456725016327709663199738238442164843147132789153725513257167915555162094970853584447993125488607696008169807374736711297007473812256272245489405898470297178738029484459690836250560495461579533254473316340608217876781986188705928270735695752830825527963838355419762516246028680280988020401914551825487349990306976304093109384451438813251211051597392127491464898797406789175453067960072008590614886532333015881171367104445044718144312416815712216611576221546455968770801413440778423979", "1214855636816562637502584060163403830270705000634713483015101384881871978446801224798536155406895823305035467591632531067547890948695117172076954220727075688048751022421198712032848890056357845974246560748347918630050853933697792254955890439720297560693579400297062396904306270145886830719309296352765295712183040773146419022875165382778007040109957609739589875590885701126197906063620133954893216612678838507540777138437797705602453719559017633986486649523611975865005712371194067612263330335590526176087004421363598470302731349138773205901447704682181517904064735636518462452242791676541725292378925568296858010151852326316777511935037531017413910506921922450666933202278489024521263798482237150056835746454842662048692127173834433089016107854491097456725016327709663199738238442164843147132789153725513257167915555162094970853584447993125488607696008169807374736711297007473812256272245489405898470297178738029484459690836250560495461579533254473316340608217876781986188705928270735695752830825527963838355419762516246028680280988020401914551825487349990306976304093109384451438813251211051597392127491464898797406789175453067960072008590614886532333015881171367104445044718144312416815712216611576221546455968770801413440778423979",
NULL NULL
}; };
log = fopen("expt.log", "w");
for (n = 0; primes[n]; n++) { for (n = 0; primes[n]; n++) {
mp_read_radix(&a, primes[n], 10); mp_read_radix(&a, primes[n], 10);
mp_zero(&b); mp_zero(&b);
@ -183,12 +204,21 @@ expttime:
exit(0); exit(0);
} }
printf("Exponentiating\t%4d-bit => %9llu/sec, %9llu ticks\n", mp_count_bits(&a), (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt, tt); printf("Exponentiating\t%4d-bit => %9llu/sec, %9llu ticks\n", mp_count_bits(&a), (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt, tt);
fprintf(log, "%d,%9llu\n", cnt, (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt);
} }
} }
fclose(log);
invtime:
log = fopen("invmod.log", "w");
for (cnt = 4; cnt <= 128; cnt += 4) {
mp_rand(&a, cnt);
mp_rand(&b, cnt);
do {
mp_add_d(&b, 1, &b);
mp_gcd(&a, &b, &c);
} while (mp_cmp_d(&c, 1) != MP_EQ);
mp_read_radix(&a, "340282366920938463463374607431768211455", 10);
mp_read_radix(&b, "234892374891378913789237289378973232333", 10);
while (a.used * DIGIT_BIT < 8192) {
reset(); reset();
for (rr = 0; rr < 10000; rr++) { for (rr = 0; rr < 10000; rr++) {
mp_invmod(&b, &a, &c); mp_invmod(&b, &a, &c);
@ -200,16 +230,18 @@ expttime:
return 0; return 0;
} }
printf("Inverting mod\t%4d-bit => %9llu/sec, %9llu ticks\n", mp_count_bits(&a), (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt, tt); printf("Inverting mod\t%4d-bit => %9llu/sec, %9llu ticks\n", mp_count_bits(&a), (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt, tt);
mp_sqr(&a, &a); fprintf(log, "%d,%9llu\n", cnt, (((unsigned long long)rr)*CLOCKS_PER_SEC)/tt);
mp_sqr(&b, &b);
} }
fclose(log);
return 0; return 0;
#endif #endif
inv_n = expt_n = lcm_n = gcd_n = add_n = sub_n = mul_n = div_n = sqr_n = mul2d_n = div2d_n = 0; div2_n = mul2_n = inv_n = expt_n = lcm_n = gcd_n = add_n =
sub_n = mul_n = div_n = sqr_n = mul2d_n = div2d_n = cnt = 0;
for (;;) { for (;;) {
if (!(++cnt & 15)) sleep(3);
/* randomly clear and re-init one variable, this has the affect of triming the alloc space */ /* randomly clear and re-init one variable, this has the affect of triming the alloc space */
switch (abs(rand()) % 7) { switch (abs(rand()) % 7) {
@ -223,7 +255,7 @@ expttime:
} }
printf("%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%5d\r", add_n, sub_n, mul_n, div_n, sqr_n, mul2d_n, div2d_n, gcd_n, lcm_n, expt_n, inv_n, _ifuncs); printf("%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%7lu/%7lu ", add_n, sub_n, mul_n, div_n, sqr_n, mul2d_n, div2d_n, gcd_n, lcm_n, expt_n, inv_n, div2_n, mul2_n);
fgets(cmd, 4095, stdin); fgets(cmd, 4095, stdin);
cmd[strlen(cmd)-1] = 0; cmd[strlen(cmd)-1] = 0;
printf("%s ]\r",cmd); fflush(stdout); printf("%s ]\r",cmd); fflush(stdout);
@ -386,6 +418,28 @@ draw(&a);draw(&b);draw(&c);draw(&d);
return 0; return 0;
} }
} else if (!strcmp(cmd, "div2")) { ++div2_n;
fgets(buf, 4095, stdin); mp_read_radix(&a, buf, 10);
fgets(buf, 4095, stdin); mp_read_radix(&b, buf, 10);
mp_div_2(&a, &c);
if (mp_cmp(&c, &b) != MP_EQ) {
printf("div_2 %lu failure\n", div2_n);
draw(&a);
draw(&b);
draw(&c);
return 0;
}
} else if (!strcmp(cmd, "mul2")) { ++mul2_n;
fgets(buf, 4095, stdin); mp_read_radix(&a, buf, 10);
fgets(buf, 4095, stdin); mp_read_radix(&b, buf, 10);
mp_mul_2(&a, &c);
if (mp_cmp(&c, &b) != MP_EQ) {
printf("mul_2 %lu failure\n", mul2_n);
draw(&a);
draw(&b);
draw(&c);
return 0;
}
} }
} }

View File

@ -17,4 +17,4 @@ mersenne: mersenne.o
$(CC) mersenne.o $(LIBNAME) -o mersenne $(CC) mersenne.o $(LIBNAME) -o mersenne
clean: clean:
rm -f *.o *.exe pprime tune mersenne rm -f *.log *.o *.obj *.exe pprime tune mersenne

14
etc/makefile.msvc Normal file
View File

@ -0,0 +1,14 @@
#MSVC Makefile
#
#Tom St Denis
CFLAGS = /I../ /Ogityb2 /Gs /DWIN32 /W3
pprime: pprime.obj
cl pprime.obj ../tommath.lib
mersenne: mersenne.obj
cl mersenne.obj ../tommath.lib
tune: tune.obj
cl tune.obj ../tommath.lib

View File

@ -3,7 +3,7 @@
* Tom St Denis, tomstdenis@iahu.ca * Tom St Denis, tomstdenis@iahu.ca
*/ */
#include <time.h> #include <time.h>
#include <bn.h> #include <tommath.h>
int int
is_mersenne (long s, int *pp) is_mersenne (long s, int *pp)

View File

@ -17,10 +17,10 @@ time_mult (void)
mp_init (&c); mp_init (&c);
t1 = clock (); t1 = clock ();
for (x = 8; x <= 128; x += 8) { for (x = 4; x <= 128; x += 4) {
for (y = 0; y < 1000; y++) {
mp_rand (&a, x); mp_rand (&a, x);
mp_rand (&b, x); mp_rand (&b, x);
for (y = 0; y < 10000; y++) {
mp_mul (&a, &b, &c); mp_mul (&a, &b, &c);
} }
} }
@ -41,9 +41,9 @@ time_sqr (void)
mp_init (&b); mp_init (&b);
t1 = clock (); t1 = clock ();
for (x = 8; x <= 128; x += 8) { for (x = 4; x <= 128; x += 4) {
for (y = 0; y < 1000; y++) {
mp_rand (&a, x); mp_rand (&a, x);
for (y = 0; y < 10000; y++) {
mp_sqr (&a, &b); mp_sqr (&a, &b);
} }
} }
@ -52,20 +52,54 @@ time_sqr (void)
return clock () - t1; return clock () - t1;
} }
clock_t
time_expt (void)
{
clock_t t1;
int x, y;
mp_int a, b, c, d;
mp_init (&a);
mp_init (&b);
mp_init (&c);
mp_init (&d);
t1 = clock ();
for (x = 4; x <= 128; x += 4) {
mp_rand (&a, x);
mp_rand (&b, x);
mp_rand (&c, x);
if (mp_iseven (&c) != 0) {
mp_add_d (&c, 1, &c);
}
for (y = 0; y < 10; y++) {
mp_exptmod (&a, &b, &c, &d);
}
}
mp_clear (&d);
mp_clear (&c);
mp_clear (&b);
mp_clear (&a);
return clock () - t1;
}
int int
main (void) main (void)
{ {
int best_mult, best_square; int best_mult, best_square, best_exptmod;
clock_t best, ti; clock_t best, ti;
FILE *log;
best_mult = best_square = 0; best_mult = best_square = best_exptmod = 0;
/* tune multiplication first */ /* tune multiplication first */
log = fopen ("mult.log", "w");
best = CLOCKS_PER_SEC * 1000; best = CLOCKS_PER_SEC * 1000;
for (KARATSUBA_MUL_CUTOFF = 8; KARATSUBA_MUL_CUTOFF <= 128; for (KARATSUBA_MUL_CUTOFF = 8; KARATSUBA_MUL_CUTOFF <= 128; KARATSUBA_MUL_CUTOFF++) {
KARATSUBA_MUL_CUTOFF++) {
ti = time_mult (); ti = time_mult ();
printf ("%4d : %9lu\r", KARATSUBA_MUL_CUTOFF, ti); printf ("%4d : %9lu\r", KARATSUBA_MUL_CUTOFF, ti);
fprintf (log, "%d, %lu\n", KARATSUBA_MUL_CUTOFF, ti);
fflush (stdout); fflush (stdout);
if (ti < best) { if (ti < best) {
printf ("New best: %lu, %d \n", ti, KARATSUBA_MUL_CUTOFF); printf ("New best: %lu, %d \n", ti, KARATSUBA_MUL_CUTOFF);
@ -73,13 +107,15 @@ main (void)
best_mult = KARATSUBA_MUL_CUTOFF; best_mult = KARATSUBA_MUL_CUTOFF;
} }
} }
fclose (log);
/* tune squaring */ /* tune squaring */
log = fopen ("sqr.log", "w");
best = CLOCKS_PER_SEC * 1000; best = CLOCKS_PER_SEC * 1000;
for (KARATSUBA_SQR_CUTOFF = 8; KARATSUBA_SQR_CUTOFF <= 128; for (KARATSUBA_SQR_CUTOFF = 8; KARATSUBA_SQR_CUTOFF <= 128; KARATSUBA_SQR_CUTOFF++) {
KARATSUBA_SQR_CUTOFF++) {
ti = time_sqr (); ti = time_sqr ();
printf ("%4d : %9lu\r", KARATSUBA_SQR_CUTOFF, ti); printf ("%4d : %9lu\r", KARATSUBA_SQR_CUTOFF, ti);
fprintf (log, "%d, %lu\n", KARATSUBA_SQR_CUTOFF, ti);
fflush (stdout); fflush (stdout);
if (ti < best) { if (ti < best) {
printf ("New best: %lu, %d \n", ti, KARATSUBA_SQR_CUTOFF); printf ("New best: %lu, %d \n", ti, KARATSUBA_SQR_CUTOFF);
@ -87,10 +123,30 @@ main (void)
best_square = KARATSUBA_SQR_CUTOFF; best_square = KARATSUBA_SQR_CUTOFF;
} }
} }
fclose (log);
/* tune exptmod */
KARATSUBA_MUL_CUTOFF = best_mult;
KARATSUBA_SQR_CUTOFF = best_square;
log = fopen ("expt.log", "w");
best = CLOCKS_PER_SEC * 1000;
for (MONTGOMERY_EXPT_CUTOFF = 8; MONTGOMERY_EXPT_CUTOFF <= 192; MONTGOMERY_EXPT_CUTOFF++) {
ti = time_expt ();
printf ("%4d : %9lu\r", MONTGOMERY_EXPT_CUTOFF, ti);
fflush (stdout);
fprintf (log, "%d : %lu\r", MONTGOMERY_EXPT_CUTOFF, ti);
if (ti < best) {
printf ("New best: %lu, %d\n", ti, MONTGOMERY_EXPT_CUTOFF);
best = ti;
best_exptmod = MONTGOMERY_EXPT_CUTOFF;
}
}
fclose (log);
printf printf
("\n\n\nKaratsuba Multiplier Cutoff: %d\nKaratsuba Squaring Cutoff: %d\n", ("\n\n\nKaratsuba Multiplier Cutoff: %d\nKaratsuba Squaring Cutoff: %d\nMontgomery exptmod Cutoff: %d\n",
best_mult, best_square); best_mult, best_square, best_exptmod);
return 0; return 0;
} }

View File

@ -1,6 +1,6 @@
CFLAGS += -I./ -Wall -W -Wshadow -O3 -fomit-frame-pointer -funroll-loops CFLAGS += -I./ -Wall -W -Wshadow -O3 -fomit-frame-pointer -funroll-loops
VERSION=0.13 VERSION=0.14
default: libtommath.a default: libtommath.a
@ -60,7 +60,7 @@ docs: docdvi
rm -f bn.log bn.aux bn.dvi rm -f bn.log bn.aux bn.dvi
clean: clean:
rm -f *.pdf *.o *.a *.exe etclib/*.o demo/demo.o test ltmtest mpitest mtest/mtest mtest/mtest.exe \ rm -f *.pdf *.o *.a *.obj *.lib *.exe etclib/*.o demo/demo.o test ltmtest mpitest mtest/mtest mtest/mtest.exe \
bn.log bn.aux bn.dvi *.log *.s mpi.c bn.log bn.aux bn.dvi *.log *.s mpi.c
cd etc ; make clean cd etc ; make clean

26
makefile.msvc Normal file
View File

@ -0,0 +1,26 @@
#MSVC Makefile
#
#Tom St Denis
CFLAGS = /I. /Ogityb2 /Gs /DWIN32 /W3
default: library
OBJECTS=bncore.obj bn_mp_init.obj bn_mp_clear.obj bn_mp_exch.obj bn_mp_grow.obj bn_mp_shrink.obj \
bn_mp_clamp.obj bn_mp_zero.obj bn_mp_set.obj bn_mp_set_int.obj bn_mp_init_size.obj bn_mp_copy.obj \
bn_mp_init_copy.obj bn_mp_abs.obj bn_mp_neg.obj bn_mp_cmp_mag.obj bn_mp_cmp.obj bn_mp_cmp_d.obj \
bn_mp_rshd.obj bn_mp_lshd.obj bn_mp_mod_2d.obj bn_mp_div_2d.obj bn_mp_mul_2d.obj bn_mp_div_2.obj \
bn_mp_mul_2.obj bn_s_mp_add.obj bn_s_mp_sub.obj bn_fast_s_mp_mul_digs.obj bn_s_mp_mul_digs.obj \
bn_fast_s_mp_mul_high_digs.obj bn_s_mp_mul_high_digs.obj bn_fast_s_mp_sqr.obj bn_s_mp_sqr.obj \
bn_mp_add.obj bn_mp_sub.obj bn_mp_karatsuba_mul.obj bn_mp_mul.obj bn_mp_karatsuba_sqr.obj \
bn_mp_sqr.obj bn_mp_div.obj bn_mp_mod.obj bn_mp_add_d.obj bn_mp_sub_d.obj bn_mp_mul_d.obj \
bn_mp_div_d.obj bn_mp_mod_d.obj bn_mp_expt_d.obj bn_mp_addmod.obj bn_mp_submod.obj \
bn_mp_mulmod.obj bn_mp_sqrmod.obj bn_mp_gcd.obj bn_mp_lcm.obj bn_fast_mp_invmod.obj bn_mp_invmod.obj \
bn_mp_reduce.obj bn_mp_montgomery_setup.obj bn_fast_mp_montgomery_reduce.obj bn_mp_montgomery_reduce.obj \
bn_mp_exptmod_fast.obj bn_mp_exptmod.obj bn_mp_2expt.obj bn_mp_n_root.obj bn_mp_jacobi.obj bn_reverse.obj \
bn_mp_count_bits.obj bn_mp_read_unsigned_bin.obj bn_mp_read_signed_bin.obj bn_mp_to_unsigned_bin.obj \
bn_mp_to_signed_bin.obj bn_mp_unsigned_bin_size.obj bn_mp_signed_bin_size.obj bn_radix.obj \
bn_mp_xor.obj bn_mp_and.obj bn_mp_or.obj bn_mp_rand.obj bn_mp_montgomery_calc_normalization.obj
library: $(OBJECTS)
lib /out:tommath.lib $(OBJECTS)

View File

@ -41,7 +41,7 @@ void rand_num(mp_int *a)
unsigned char buf[512]; unsigned char buf[512];
top: top:
size = 1 + ((fgetc(rng)*fgetc(rng)) % 96); size = 1 + ((fgetc(rng)*fgetc(rng)) % 512);
buf[0] = (fgetc(rng)&1)?1:0; buf[0] = (fgetc(rng)&1)?1:0;
fread(buf+1, 1, size, rng); fread(buf+1, 1, size, rng);
for (n = 0; n < size; n++) { for (n = 0; n < size; n++) {
@ -57,7 +57,7 @@ void rand_num2(mp_int *a)
unsigned char buf[512]; unsigned char buf[512];
top: top:
size = 1 + ((fgetc(rng)*fgetc(rng)) % 96); size = 1 + ((fgetc(rng)*fgetc(rng)) % 512);
buf[0] = (fgetc(rng)&1)?1:0; buf[0] = (fgetc(rng)&1)?1:0;
fread(buf+1, 1, size, rng); fread(buf+1, 1, size, rng);
for (n = 0; n < size; n++) { for (n = 0; n < size; n++) {
@ -73,6 +73,8 @@ int main(void)
mp_int a, b, c, d, e; mp_int a, b, c, d, e;
char buf[4096]; char buf[4096];
static int tests[] = { 11, 12 };
mp_init(&a); mp_init(&a);
mp_init(&b); mp_init(&b);
mp_init(&c); mp_init(&c);
@ -89,7 +91,7 @@ int main(void)
} }
for (;;) { for (;;) {
n = 4; // fgetc(rng) % 11; n = fgetc(rng) % 13;
if (n == 0) { if (n == 0) {
/* add tests */ /* add tests */
@ -235,6 +237,23 @@ int main(void)
printf("%s\n", buf); printf("%s\n", buf);
mp_todecimal(&c, buf); mp_todecimal(&c, buf);
printf("%s\n", buf); printf("%s\n", buf);
} else if (n == 11) {
rand_num(&a);
mp_mul_2(&a, &a);
mp_div_2(&a, &b);
printf("div2\n");
mp_todecimal(&a, buf);
printf("%s\n", buf);
mp_todecimal(&b, buf);
printf("%s\n", buf);
} else if (n == 12) {
rand_num2(&a);
mp_mul_2(&a, &b);
printf("mul2\n");
mp_todecimal(&a, buf);
printf("%s\n", buf);
mp_todecimal(&b, buf);
printf("%s\n", buf);
} }
} }
fclose(rng); fclose(rng);

View File

@ -1,36 +0,0 @@
CLOCKS_PER_SEC == 1000
Adding 128-bit => 14534883/sec, 688 ticks
Adding 256-bit => 11037527/sec, 906 ticks
Adding 512-bit => 8650519/sec, 1156 ticks
Adding 1024-bit => 5871990/sec, 1703 ticks
Adding 2048-bit => 3575259/sec, 2797 ticks
Adding 4096-bit => 2018978/sec, 4953 ticks
Subtracting 128-bit => 11025358/sec, 907 ticks
Subtracting 256-bit => 9149130/sec, 1093 ticks
Subtracting 512-bit => 7440476/sec, 1344 ticks
Subtracting 1024-bit => 5078720/sec, 1969 ticks
Subtracting 2048-bit => 3168567/sec, 3156 ticks
Subtracting 4096-bit => 1833852/sec, 5453 ticks
Squaring 128-bit => 3205128/sec, 78 ticks
Squaring 256-bit => 1592356/sec, 157 ticks
Squaring 512-bit => 696378/sec, 359 ticks
Squaring 1024-bit => 266808/sec, 937 ticks
Squaring 2048-bit => 85999/sec, 2907 ticks
Squaring 4096-bit => 21949/sec, 11390 ticks
Multiplying 128-bit => 3205128/sec, 78 ticks
Multiplying 256-bit => 1592356/sec, 157 ticks
Multiplying 512-bit => 615763/sec, 406 ticks
Multiplying 1024-bit => 192752/sec, 1297 ticks
Multiplying 2048-bit => 53510/sec, 4672 ticks
Multiplying 4096-bit => 14801/sec, 16890 ticks
Exponentiating 513-bit => 531/sec, 47 ticks
Exponentiating 769-bit => 177/sec, 141 ticks
Exponentiating 1025-bit => 88/sec, 282 ticks
Exponentiating 2049-bit => 13/sec, 1890 ticks
Exponentiating 2561-bit => 6/sec, 3812 ticks
Exponentiating 3073-bit => 4/sec, 6031 ticks
Exponentiating 4097-bit => 1/sec, 12843 ticks
Inverting mod 128-bit => 19160/sec, 5219 ticks
Inverting mod 256-bit => 8290/sec, 12062 ticks
Inverting mod 512-bit => 3565/sec, 28047 ticks
Inverting mod 1024-bit => 1305/sec, 76594 ticks

View File

@ -1,36 +0,0 @@
CLOCKS_PER_SEC == 1000
Adding 128-bit => 15600624/sec, 641 ticks
Adding 256-bit => 12804097/sec, 781 ticks
Adding 512-bit => 10000000/sec, 1000 ticks
Adding 1024-bit => 7032348/sec, 1422 ticks
Adding 2048-bit => 4076640/sec, 2453 ticks
Adding 4096-bit => 2424242/sec, 4125 ticks
Subtracting 128-bit => 10845986/sec, 922 ticks
Subtracting 256-bit => 9416195/sec, 1062 ticks
Subtracting 512-bit => 7710100/sec, 1297 ticks
Subtracting 1024-bit => 5159958/sec, 1938 ticks
Subtracting 2048-bit => 3299241/sec, 3031 ticks
Subtracting 4096-bit => 1987676/sec, 5031 ticks
Squaring 128-bit => 3205128/sec, 78 ticks
Squaring 256-bit => 1592356/sec, 157 ticks
Squaring 512-bit => 696378/sec, 359 ticks
Squaring 1024-bit => 266524/sec, 938 ticks
Squaring 2048-bit => 86505/sec, 2890 ticks
Squaring 4096-bit => 22471/sec, 11125 ticks
Multiplying 128-bit => 3205128/sec, 78 ticks
Multiplying 256-bit => 1592356/sec, 157 ticks
Multiplying 512-bit => 615763/sec, 406 ticks
Multiplying 1024-bit => 190548/sec, 1312 ticks
Multiplying 2048-bit => 54418/sec, 4594 ticks
Multiplying 4096-bit => 14897/sec, 16781 ticks
Exponentiating 513-bit => 531/sec, 47 ticks
Exponentiating 769-bit => 177/sec, 141 ticks
Exponentiating 1025-bit => 84/sec, 297 ticks
Exponentiating 2049-bit => 13/sec, 1875 ticks
Exponentiating 2561-bit => 6/sec, 3766 ticks
Exponentiating 3073-bit => 4/sec, 6000 ticks
Exponentiating 4097-bit => 1/sec, 12750 ticks
Inverting mod 128-bit => 17301/sec, 578 ticks
Inverting mod 256-bit => 8103/sec, 1234 ticks
Inverting mod 512-bit => 3422/sec, 2922 ticks
Inverting mod 1024-bit => 1330/sec, 7516 ticks

View File

@ -1,5 +0,0 @@
Exponentiating 513-bit => 531/sec, 94 ticks
Exponentiating 769-bit => 187/sec, 266 ticks
Exponentiating 1025-bit => 88/sec, 562 ticks
Exponentiating 2049-bit => 13/sec, 3719 ticks

View File

@ -10,7 +10,7 @@
* The library is free for all purposes without any express * The library is free for all purposes without any express
* guarantee it works. * guarantee it works.
* *
* Tom St Denis, tomstdenis@iahu.ca, http://libtommath.iahu.ca * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org
*/ */
#ifndef BN_H_ #ifndef BN_H_
#define BN_H_ #define BN_H_