This week I add squaring for p = 16 and it give good speed up for 100 digit prime (actually, for ~100 digit number used my implementations of squaring). After this we can prove 100 digit number primality for 0.13 sec and 300 digit number for 8.3 sec.
The next thing I was working on is Montgomery modular multiplication and reduction.
Firstly, we need to convert our unity_zp structure (recall that it is just a fmpz_mod_poly, mod = n) into Montgomery. For this we need to select $r > n$ such that $r = 2^k$ and compute $f * r$ there $f$ - unity_zp; so, element $x$ in Montgomery form can be represented as $x = a * r$, there $a$ from simple form. Because we do not want to use fmpz_mod or division functions we can store our unity_zp_mont data into fmpz_poly. It was implemented in unity_zp_mont_to() and init functions.
Secondly, we need to implement basic operations. Addition and subtraction are the same as ordinary modular addition and subtraction: $x + y = a*r + b*r = (a + b)r = z$ there $x$, $y$ and $z$ in Montgomery form. Multiplications works a little harder: $x * y = (a*r) * (b*r) = (a*b)*r^2$, so we need to divide by $r$. To do it fast exists special REDC function, which uses the fact that $r = 2^k$ and we can use binary shift operations instead of division.
Using this form in exponentiation function gives small speed up (8 sec instead of 8.3 for 300 digits number). I think, if I implement REDC function in lower level (GMP have some implementations, but them not have public interface) this will give even a small increase in speed. Moreover, I not use REDC for small p = 4, 5, 7, 9 and 16, so Montgomery stuff not used for small numbers.
The next thing I was working on is Montgomery modular multiplication and reduction.
Firstly, we need to convert our unity_zp structure (recall that it is just a fmpz_mod_poly, mod = n) into Montgomery. For this we need to select $r > n$ such that $r = 2^k$ and compute $f * r$ there $f$ - unity_zp; so, element $x$ in Montgomery form can be represented as $x = a * r$, there $a$ from simple form. Because we do not want to use fmpz_mod or division functions we can store our unity_zp_mont data into fmpz_poly. It was implemented in unity_zp_mont_to() and init functions.
Secondly, we need to implement basic operations. Addition and subtraction are the same as ordinary modular addition and subtraction: $x + y = a*r + b*r = (a + b)r = z$ there $x$, $y$ and $z$ in Montgomery form. Multiplications works a little harder: $x * y = (a*r) * (b*r) = (a*b)*r^2$, so we need to divide by $r$. To do it fast exists special REDC function, which uses the fact that $r = 2^k$ and we can use binary shift operations instead of division.
Using this form in exponentiation function gives small speed up (8 sec instead of 8.3 for 300 digits number). I think, if I implement REDC function in lower level (GMP have some implementations, but them not have public interface) this will give even a small increase in speed. Moreover, I not use REDC for small p = 4, 5, 7, 9 and 16, so Montgomery stuff not used for small numbers.
No comments:
Post a Comment