This week not much has changed, but I tried a couple of things that gave nothing.
After using Montgomery form for general multiplication and squaring I try to use it in special sqr functions (like unity_zp_sqr7). For this I add some modular functions for fmpz in Montgomery form (add, sub, mul: inlined in aprcl.h file) and add unity_zp_mont_sqr7() function. Using it instead of unity_zp_sqr7 and simple form works slower. Maybe, I implement something not very accurately.
I also add unity_zp_sqr11() function and it gave a good speed up (~10% for 300 digits number; we can test 300 digits $n$ for 7.3 sec). Functions like this give a better performance, but them require a lot time to code and debug. So far I have used the Supplement to Implementation of a New Primality Test, but the better way is to think about generalization.
On the other hand if I add similar functions for $p^k = 13, 27, 32$ this will be enough for the numbers up to ~1000 of digits. I also need to have a better look at $R$ and $s$ selection.
I think that I need to rewrite unity_zp_sqr and unity_zp_mul functions to reduce modulo $n$ after cyclotomic poly reduction, so we have less number of fmpz_mod function calls.
Only 4 weeks left until soft GSoC deadline, so I have to start thinking about future code integration (as first step I want to create new branch and remove files related with Gauss sum test).
Valgrind callgrind call graphs for 100 and 300 digits numbers.
After using Montgomery form for general multiplication and squaring I try to use it in special sqr functions (like unity_zp_sqr7). For this I add some modular functions for fmpz in Montgomery form (add, sub, mul: inlined in aprcl.h file) and add unity_zp_mont_sqr7() function. Using it instead of unity_zp_sqr7 and simple form works slower. Maybe, I implement something not very accurately.
I also add unity_zp_sqr11() function and it gave a good speed up (~10% for 300 digits number; we can test 300 digits $n$ for 7.3 sec). Functions like this give a better performance, but them require a lot time to code and debug. So far I have used the Supplement to Implementation of a New Primality Test, but the better way is to think about generalization.
from 50 to 100 digits |
from 50 to 300 digits |
I think that I need to rewrite unity_zp_sqr and unity_zp_mul functions to reduce modulo $n$ after cyclotomic poly reduction, so we have less number of fmpz_mod function calls.
Only 4 weeks left until soft GSoC deadline, so I have to start thinking about future code integration (as first step I want to create new branch and remove files related with Gauss sum test).
Valgrind callgrind call graphs for 100 and 300 digits numbers.
No comments:
Post a Comment