Vladimir Glazachev blog: 2015

Tuesday 25 August 2015

GSoC with flint: APR-CL primality test

Last couple weeks I added a lot of documentation, implemented important tests and fixed couple minor bugs, so this post is the summary for my implementation of primality test algorithm during GSoC with flint.

APR-CL is the mostly used algorithm for proving primality of general numbers. APR-CL (Adleman, Pomerance, Rumely, Cohen, Lenstra) is the modification of APR test. There are several versions of this algorithm and my implementation mostly described in [1] and [2]. Other modern tests are ECPP (Eliptice curve primality proving) and AKS (Agrawal–Kayal–Saxena).

APR-CL test uses the idea that we can not only check some conditions, but also collect this information and finally say that the number doesn't have any non trivial divisors. APR-CL uses the congruents analogues to Ferma's one. Finally, after we check some of these tests we can see that all possible divisor of n in small set of numbers, and to check that number is prime we only need "several" divisions ("several" can mean millions of divisions...but it's not very much for the numbers with thousands of digits). For more information see previous posts: (1), (2), (3).

My implementation is mostly based on [1]. I implemented a simple version that does not use $p-1$, $p+1$ or Lucas style tests. While codeing I looked at PARI/GP and mpz_aprcl implementations and took some initial values (t's) from PARI. After that I spent some time for "tuning" and now I believe that I chose a good values for numbers up to 2000 digits (but perhaps they can still be improved).

So, the algorithm consists from three steps:

$R$ and $s$ selection
Jacobi sum test
Final division

As before, $n$ is the number to test for primality. We have $s$ such that $s^2 > n$ and $gcd(s*R, n) = 1$ and $a^R \equiv 1 \mod{s}$ for all $a$ coprime to $s$. Here we have 2 goals: minimize the $s$ and minimize the cost of future Jacobi sum test for $s$ divisors. I implemented only the first but the second also requires research (some ideas are described in [3]). Next we have final division step which depends from size of $R$ value. In final division we need to do $R$ divisions and it takes meaningful amount of time for big $n$.
$s$ reduction step described in this post. Also we can select $s^3 > n$, in all papers it was given as important improvement, but not implemented. Using flint implementation of Divisors in Residue Classes algorithm I have not get any speed gain for tested numbers. So I did not use it.

The essential part of Jacobi test step is to verify that power of multiplication of Jacobi sums is the unity root. So, the longest part of this step is exponentiation. As before we represent Jacobi sums as poly modulo cyclotomic poly, so we need to optimize modular polynomial exponentiation. This can be done in two directions: optimize powering function or polynomial multiplication/squaring functions. I worked on both of this and implemented 2^k-ry, sliding window exponentiation methods and special functions for multiplication and squaring.
In the future, this step can be improved if we add more special mul/sqr function (which, I believe, can be builded automatically, if we use some optimization methods..). Also before this step we can use Lucas style tests to look at some divisors and exclude them from Jacobi test. Finally, we can use some special forms of number to reduce the number of divisions (mod operations). For example, Montgomery form was implemented and tested, but it not give any speed up. I think it was mostly because I not implement it accuracy and better implementation can give some speed boost.
Some of these and other concepts are described [3].

In final step we just need to verify that n is prime. It was not very possible that we verify that n composite in this step, so it can't be used as effective factorization algorithm. In this step we know that all possible divisors of n can be find: if $s^2 > n$ then we check dividers in the form $n^i \mod{s}$; if $s^3 > n$ then we use Divisors in Residue Classes algorithm.

One of the easy and possible things to do is parallelization of second and third steps. I think it gave near linear speed up from number of processors, while we have only data ($q$ values in second and divisions in third) dependency which easy divides.

Finally I provide timings for current version. It can prove primality of 100 digit number for 0.7, 300 digit for 4 and 2000 digit for one hour on my 2.7 GHz Core i5 processor.

from 50 to 300 digit numbers

from 300 to 2000 digit numbers

So, I think it have good timings and believe it was many possible improvements. I hope my code was usefull for flint and merged after all necessary corrections.

Also provides Gauss sum test described in [4], which is not very useful in practice, but can beuseful for people who want to see an implementation of these.

Thanks flint and Google for this great summer of code!

[1] - "Implementation of a New Primality Test" by H. Cohen and A. K. Lenstra
[2] - "A Course in Computational Algebraic" by H. Cohen
[3] - "PRIMALITY PROVING WITH CYCLOTOMY" by Wiebren Bosma
[4] - "Four primality testing algorithms" by René Schoof

Wednesday 5 August 2015

GSoC week 10; Some infrastructure code

This week I added a lot of comments and some documentation. Next I processed all compiler warnings and fix couple of bugs (in tests).

I created a new branch which contains all "clear" code (I remove Montgomery form and some extra files), so that we can review it and if everything is okay merge it..

timings from profiling code

I add profiling code with five primes of each size = 50, 60, ..., 300. I took a numbers from List of small primes. So anyone can now easily look at timings on his computer. I also run test for 2^2000+841 (603 digits) and 2^3515+159 (1059 digits) it took 60 and 700 seconds respectively.

In PARI/GP implementation for Jacobi sum computation them use q / 2 size table and I use two q size tables (to compute f(x) such that 1 - g^x = f(x), g - modulo q primitive root). Now I'm working on it. It will save a few megabytes of ram for large numbers and maybe give small speed up.

Tuesday 28 July 2015

GSoC week 9; Special multiplication and s reduction

In the last week I tried a couple of interesting things, which well speed up the test.

I add config functions, which try to reduce $s$ value. If $s$ have prime power factor $q^k$ for which $s/q^k > n^{1/2}$ then we can choose biggest $q$ and set $s = s/q^k$. This is repeated until it is no longer possible (_jacobi_config_reduce_s()). It gave a nice speed boost for numbers $n$ which are much smaller than the computed number $s$. This optimization enabled us to slightly outperform PARI/GP and "smooth" time of computation (no spikes).

up to 300 digits

Next following article, I have implemented the better $s$ reduction function (_jacobi_config_reduce_s2()) which is the approximation of knapsack problem solution. We choose weight of each $q$ equal to $w(q) = \sum\limits_{p | q-1}{\phi^2(p^j)}$, where $\phi$ denotes Euler's function. Then we find a prime power factor $q^k$ of $s$ with max $w(q)$ and set $s = s/q^k$. By doing this as long as possible, we will get a good $s$ value. It also gave a small speed boost. For better explanations and links you can look at chapter 4 of "Primality Testing and Jacobi Sums". As I see it is not implemented in PARI/GP aprcl.

Another thing which gave interesting results - I remove $t = 166320 = 2^4*3^3*5*7*11$ and always use $t = 720720 = 2^4*3^2*5*7*11*13$. If we use it together with $s$ reduction function we have ~0.5 second speed up for 300 digits prime (5.6 sec instead of 6.1). These is because we do not have a special sqr/mul functions for p = 13 or p = 27, but for p = 27 we have poly of length 17 and for p = 13 length = 12. After using $s$ reduction function in both cases we have $s$ of similar size, but use faster multiplication/squaring. It works for all numbers which are used t = 166320. Also in this place there is space for optimization, because now in final divison step we need to do 720720 multiplications and divisions, and it takes ~15% of all time.

The next thing to look at in the future is special functions sqr/mul for all $p$ values. I look at p = 13 and write result for squaring and I don't think that's a good idea to work with it manually. It look like some type of cse task and I think I can find some algorithm for this later...

I also add special multiplication functions for p = 3, 4, 5, 7, 8, 9, 11 and 16. It gave just a couple of percent increase, but almost for free :)

So, now test works 0.115 sec for 100 and 5.6 sec for 300 digits numbers (I think I have never posted my processor type - Core2Duo T6670 2.2GHz processor, so I do not have avx instructuions set, which may be can be used in gmp).

This week I want to fix all warnings I have, do some refactoring and add more documentation. I also want to add something similar to profile folder with primes of different sizes (from https://primes.utm.edu/lists/small/ , for example) and outputs. I have it for me, but it was written not very carefully and not committed.

Tuesday 21 July 2015

GSoC week 8; More squaring functions

This week not much has changed, but I tried a couple of things that gave nothing.

After using Montgomery form for general multiplication and squaring I try to use it in special sqr functions (like unity_zp_sqr7). For this I add some modular functions for fmpz in Montgomery form (add, sub, mul: inlined in aprcl.h file) and add unity_zp_mont_sqr7() function. Using it instead of unity_zp_sqr7 and simple form works slower. Maybe, I implement something not very accurately.

I also add unity_zp_sqr11() function and it gave a good speed up (~10% for 300 digits number; we can test 300 digits $n$ for 7.3 sec). Functions like this give a better performance, but them require a lot time to code and debug. So far I have used the Supplement to Implementation of a New Primality Test, but the better way is to think about generalization.

from 50 to 100 digits


from 50 to 300 digits

On the other hand if I add similar functions for $p^k = 13, 27, 32$ this will be enough for the numbers up to ~1000 of digits. I also need to have a better look at $R$ and $s$ selection.

I think that I need to rewrite unity_zp_sqr and unity_zp_mul functions to reduce modulo $n$ after cyclotomic poly reduction, so we have less number of fmpz_mod function calls.
Only 4 weeks left until soft GSoC deadline, so I have to start thinking about future code integration (as first step I want to create new branch and remove files related with Gauss sum test).

Valgrind callgrind call graphs for 100 and 300 digits numbers.

Tuesday 14 July 2015

GSoC week 7; Montgomery modular stuff

This week I add squaring for p = 16 and it give good speed up for 100 digit prime (actually, for ~100 digit number used my implementations of squaring). After this we can prove 100 digit number primality for 0.13 sec and 300 digit number for 8.3 sec.

The next thing I was working on is Montgomery modular multiplication and reduction.
Firstly, we need to convert our unity_zp structure (recall that it is just a fmpz_mod_poly, mod = n) into Montgomery. For this we need to select $r > n$ such that $r = 2^k$ and compute $f * r$ there $f$ - unity_zp; so, element $x$ in Montgomery form can be represented as $x = a * r$, there $a$ from simple form. Because we do not want to use fmpz_mod or division functions we can store our unity_zp_mont data into fmpz_poly. It was implemented in unity_zp_mont_to() and init functions.

Secondly, we need to implement basic operations. Addition and subtraction are the same as ordinary modular addition and subtraction: $x + y = a*r + b*r = (a + b)r = z$ there $x$, $y$ and $z$ in Montgomery form. Multiplications works a little harder: $x * y = (a*r) * (b*r) = (a*b)*r^2$, so we need to divide by $r$. To do it fast exists special REDC function, which uses the fact that $r = 2^k$ and we can use binary shift operations instead of division.

Using this form in exponentiation function gives small speed up (8 sec instead of 8.3 for 300 digits number). I think, if I implement REDC function in lower level (GMP have some implementations, but them not have public interface) this will give even a small increase in speed. Moreover, I not use REDC for small p = 4, 5, 7, 9 and 16, so Montgomery stuff not used for small numbers.

Sunday 5 July 2015

GSoC week 6; Midterm and more optimizations

This week I refactor powering algorithms and add sliding window exponentiation implementation. It gave speed boost in couple of percent.

The next thing I do is add special squaring functions for small p values: powers of 2 (for 4 and 8); 3; 5; 7 (for details see the "Implementation of a New Primality Test" supplement). Selection of function going on unity_zp_sqr_inplace() and we need to have allocated memory for fmpz_t array (t).

It gives good speed boost (~10-15%), and for "small" numbers (near 100 digits) we catch up the PARI/GP implementation and I'm confident that we will soon become faster. For bigger values PARI/GP is still faster. Below the graphs with timings up to 100 and 300 digits:

50-100 digits


50-300 digits

I also looked at $(n - 1)$ test to replace computation in $\mathbb{Z}[\zeta_r]/(n)$ by $\mathbb{Z}/n\mathbb{Z}$, but I don't see that it will give a good speed gain.
Now I want to try reduce s value for fixed R (as in (4.) of "Primality Testing and Jacobi Sums"). After we compute s we can try to remove some factors without which we still have $s^2 > n$.

In total, over the past six weeks has been implemented APR-CL algorithm with good performance. Almost all of code is tested and documented. So now I need to finish special cases of multiplication and squaring and then try to optimize in general. Also we need to look closely at different initial values of R because the values from PARI/GP impair the performance (look at 70 digit numbers in first plot, for example).

Sunday 28 June 2015

GSoC week 5; Some optimizations

This week I was working on the Jacobi sum test optimizations described in the last post.
First I add some fixed optimal $R$ values for $n$ with different number of digits. I also rewrite $s$ computation. This functions can be find here. It gave a good gain in speed, but I can add more intermediate $R$ values. It must be done carefully, because with smaller $R$ we can have worse $s$. This is shown in the graph, the PARI/GP implementation has many peaks.

mean time in seconds

Next I add $2^k$-ary powering algorithm. It gave ~20% gain in speed.

Also I do a lot of profiling using valgrind callgrind tool. I find that the computation of Jacobi sums takes a little time, so I'm not going to tabulate them.

I implement the final step of algorithm on condition that $s^3 > n$ using divisors in residue classes algorithm. After that all becomes slower. The divisors in residue classes algorithm have a good complexity $O((\ln n)^3$, but we need to call it thousands of times ($R = 2520$ for 100 digits $n$).

The next graph show timings for $n$ from 50 to 100 digits:

mean time in seconds

To better understand how selection of $R$ affects at the time I look at most slow part of algorithm ($n$-th powering in $\mathbb{Z}[\zeta_r] / (n)$) and count the number of operations. This can be seen in the following tables for 100 and 300 digits numbers:

Also attach link to valgrind outputs for 100 and 300 digits prime.

Next I want try to use directly fmpz_vec operations, without fmpz_mod_poly abstraction. After this I want to implement some formulas described in "Implementation of a New Primality Test" for fixed $r = p^k$.

Saturday 20 June 2015

GSoC week 4; Jacobi sum test implementation and further optimizations

This week I implemented the basic version of Jacobi sum primality test. It is much faster then the test using Gauss sums. For example, primality proving of 50 digits prime using Gauss sum test takes ~200 seconds, and Jacobi sum test takes less then a second. The figure shows the number of up to 200 digits.

Implementation details are well described in "A Course in Computational Algebraic Number Theory" by H. Cohen and "Implementation of a New Primality Test" by H. Cohen and A. K. Lenstra. I also describe all the steps in code comments.

The image shows call graph output of valgrind --callgrind tool for 100 digits prime:

So, we can see that to improve the speed of algorithm we must reduce the number of multiplications (which are well optimized in flint).

Currently I have a very naive implementation of selecting $s$ and $R$ values (just trying increase $R = 2, 4, 6...$ until the proper $s$ is found). It can takes a lot of time and choose non-optimal parameters. For example, if we set $R = 98280$ it proves 200 digits number primality for ~8 seconds instead of 30. By $R = 166320$ we can prove 300 digits number in ~30 seconds. So I need to rewrite aprcl_config_init() function.

The next thing to do is replace the condition $s^2 > n$ to $s^3 > n$. Using "Divisors in Residue Classes" we can determine n divisors. It was implemented in flint before, so I need only add this step in Final division step.

Also I want to replace my basic powering algorithm in $\mathbb{Z}[\zeta] / (n)$ by $2^k$-ary method.

And finally we can precompute and store Jacobi sum in configuration structure (for 300 digits number computation of Jacobi sums takes ~40% of time) for testing numbers with a similar length.

This week I want to add this modifications and look in more detail the other opensource implementations.

Saturday 13 June 2015

GSoC week 3; Road to Jacobi sum test

This week there are two important results:

Gauss sum test implemented and works;
Written operations in $\mathbb{Z}[\zeta_{p^k}]$.

The Gauss sum implementation was tested and, as expected, it works slowly. Proof of primality of 50 digits number: $29927402397991286489627837734179186385188296382227$ takes near 200 seconds.

To speed up the algorithm can be used Jacobi sums instead of Gauss sums. The Jacobi sum $J(\chi_1, \chi_2)$ defined by: $$J(\chi_1, \chi_2) = \sum\limits_{x \in \mathbb{Z}/q\mathbb{Z}}{\chi_1(x)\chi_2(1-x)}$$ $J(\chi_1, \chi_2) \in \mathbb{Z}[\zeta_p]$, much smaller finite ring, so the computation is much faster. As before we represent elements of $\mathbb{Z}[\zeta_p]$ as $\mathbb{Z}[X]/(X^p-1)$. It was implemented multiplication, addition, memory management, powering and reduction modulo cyclotomic polynomial $\Phi_{p^k}(x) = \sum\limits_{i = 0}^{p - 1}{x^{ip^{m-1}}}$. All these operations are documented and tested.

I also implement computation of Jacobi sum as described in "Implementation of a New Primality Test" by H. Cohen and A.K. Lenstra.

This week I am going to finish simple version of Jacobi sum test. Then we can implement some optimizations described in "Implementation of a New Primality Test" and, maybe, think about the new.

Saturday 6 June 2015

GSoC week 2 and Gauss sums

As described earlier, we need to implement the operation in $\mathbb{Z}[\zeta_p, \zeta_q]$. We can represent $x$ from $\mathbb{Z}[\zeta_p, \zeta_q]$ as vector of size $p$ of $\mathbb{Z}[\zeta_q] = \mathbb{Z}[X]/(X^q-1)$.

The addition of $x, y \in \mathbb{Z}[\zeta_p, \zeta_q]$ is just elementwise addition of polys $x[i] + y[i]$ for $i \in [0, p-1]$.
For multiplication $z = x * y$ from $\mathbb{Z}[\zeta_p, \zeta_q]$ we set
$$z[i +j\pmod{p}] \mathrel{+}= x[i]*y[j]\pmod{X^q-1} \text{ for } i,j<p$$

It will also be useful to have some memory management, comparison, powering and multiplication on $\zeta_p^k$ functions.

Now when we have structure to work with the elements of the $\mathbb{Z}[\zeta_p,\zeta_q]$ we can compute Gauss sum $\tau(\chi)$ for Dirichlet character $\chi_{p, q}$. $$\tau(\chi)=\sum\limits_{x \in (\mathbb{Z}/q\mathbb{Z})^*}{\chi(x)\zeta_q^x}=\sum\limits_{j=1}^{q-1}{\zeta_p^j\zeta_q^{g^j}}$$, there $g$ is the primitive root modulo $q$.

So, we now can compute Gauss sums. For test $n$ for primality we must check two conditions of the theorem 4.2 from "Four primality testing algorithms" for all $q$ and prime powers $r|q-1$:

for every prime $t$ dividing $n$ there exists $t\lambda_t$ in the ring $\mathbb{Z}_l$ of $l$-adic integers such that: $$t^{l-1}=n^{(l-1)\lambda_t} \in \mathbb{Z}_l^*$$
$\tau^{\sigma_n-n}(\chi) \in \langle\zeta_p\rangle$ in the ring $\mathbb{Z}[\zeta_p, \zeta_q]/(n)$

$\tau^{\sigma_n}(\chi)=\tau(\chi^n)$, so the second condition can be checked as:
$$\tau(\chi^n)=\zeta_p^i*\tau^n(\chi) \text{ for some }i$$
The example how this works can be found in the unit-test for Gauss sum and in the Gauss sum primality test.

The first condition can be checked using Proposition 4.3 and Claim 2.

When I finally implement first condition the Gauss sum test will be finished and we can continue to implement APR-CL. In the beginning I need to rewrite operations in $\mathbb{Z}[\zeta_p]$ more carefully.

Thursday 28 May 2015

Cyclotomic primality testing algorithm

Our final goal is to implement the APR-CL algorithm, but let's start with something simple. The Cyclotomic primality testing algorithm discribed in Four primality testing algorithms well suited for this goal. This algorithm is based on the theorem, described in a previous post.

Let the $n$ be a number for primality testing. The algorithm can be divided into three stages:

Preparation. We need to select integer numbers $R$ and $s$ such that $s=\prod\limits_{\substack{q-1|R \\ q \text{ prime}}}q$, $\sqrt{s} > n$ and $R$ is as small as possible.
Gauss sum tests. For all primes $q$ dividing $s$ and for each prime power $r$ that divides $q - 1$ we make sure that $gcd(n, qr) = 1$ and then check conditions for Gauss sum for character $\chi:(\mathbb{Z}/q\mathbb{Z})^*\to\mu_r$.
Search divider. For each integer $k < R$ check that $(n^k \bmod s) | n$. If that never happens, then $n$ is prime.

The Gaussian sum $\tau(\chi) = - \sum\limits_{q \in (\mathbb{Z}/q\mathbb{Z})*} {\zeta_p^x\chi(x)}$ . The character $\chi(x)$ can be constructed by the following formula: $\chi(g^m)=\zeta_q^m$ for each integer $m$, there $g$ is the primitive root modulo $q$.

So, the Gauss sum $\tau(\chi) \in \mathbb{Z}[\zeta_q, \zeta_p]$. This structure can be represented as a matrix or as a vector of polynomials (vector of $\mathbb{Z}[\zeta]$ elements, implemented here). That's what I'm going to do in the next days.

We now have implementation of function $f(m)$ such that $\chi(m)=\zeta_q^{f(m)}$. We also have implementation of first step of algorithm.

When I implement operations in $\mathbb{Z}[\zeta_q,\zeta_p]$ we will be able to check the two conditions of Theorem 4.2 from the "Four primality testing algorithms" paper.

Friday 27 March 2015

Key ingredient for the cyclotomic primality test

Let $n$ be a natural number. Let $q$ be a prime not dividing $n$, let $r$ be a power of a prime number $l$ not dividing n and let map the character $\chi:(\mathbb{Z}/q\mathbb{Z})^*\to\mu_r$.
If:

for every prime $p$ dividing $n$ there exists $\lambda_p$ in $l$-adic integers such that $$p^{l-1}=n^{(l-1)\lambda_p}$$ in multiplicative group of $l$-adic integers;
the Gaussian sum $\tau(\chi)$ satisfies $$\tau(\chi)^{\sigma_n-n}\in \langle\xi_r\rangle, \text{ in the ring } \mathbb{Z}[\xi_r,\xi_q]/(n) .$$ there $\langle\xi_r\rangle=$ cyclic subgroup of $(\mathbb{Z}[\xi_r]/(n))^* $ of order $r$ generated by $\xi_r$.

Then we have $$ \chi(p)= \chi(n)^{\lambda_l} $$ for every prime divisor $p$ of $n$.

By $\mu_r$ we denote subgroup of $r$-th roots of unity of $\bar{\mathbb{Q}}^*$.

Four primality testing algorithms

Wednesday 25 March 2015

Primality testing and APR-CL algorithm.

There is a simple method for verify that the number is composite without it factorization. According to Fermat's little theorem, if the number $n$ is a prime then for any integer $a$, not divisible by $n$, satisfied: $$a^{n-1} \equiv 1 \pmod n$$ The first problem is that for all possible $a$ it works too long, the second and main - there is an infinite amount of numbers (Carmichael numbers) on which the test is wrong. In practice we can take many different $a$ and if condition satisfied, then most likely the number is prime.

We see that in the general case, we need a stronger condition. For example, we can use: $$a^{\frac{n-1}{2}} \equiv \left(\dfrac{a}{n}\right) \pmod n \text{ for all } a \in \mathbb{N} \text{ with } gcd(a, n) = 1$$ Its true for odd prime $n$. Another strengthening of Fermat's theorem: $$(a+b)^n \equiv a^n + b^n \pmod{nR} \text{ for all } a \text{, } b \in R$$ there $R$ any commutative ring. They work correct when the first algorithm is wrong, but we still have computational problem. And it is a problem that the APR-CL algorithm solves.

APR-CL by Cohen and Lenstra is an improvement of APR algorithm. Important idea of APR-CL is that we not only check the conditions, but also gather information about possible divisors of $n$. And finally we can say that only $n$ and $1$ divide $n$. Now I describe the common "workflow" of the algorithm.
The algorithm is divided into three stages:

Stage 1. Preparation.
We need to select integer values $s$ and $t$ with the following properties:

$t$ is "small",
$s > n^{1/2}$ or $s > n^{1/3}$,
$a^{t} \equiv 1 \pmod s$ for all $a \in \mathbb{Z}$ with $gcd(a, s) = 1$,
the factorization of t and s known.

This part can be made once, and used for checking different numbers.
Here we also check that $gcd(n, st) = 1$; otherwise n is composite.

Stage 2. Tests.
We test number using tests similar to described above. If one of the tests failed then $n$ composite. If all test passed we can use obtained information and prove that for every divisor $r$ of $n$ there exists integer $i < t$ such that $r \equiv n^i \pmod s$.
This is the most difficult and algebraic part of algorithm. How to obtain and most importantly combine information?
First, for every prime power $p^k$ dividing $t$ select ideal of $\mathbb{Z}[\xi_{p^k}]$ (for example, $n\mathbb{Z}[\xi_{p^k}]$).

Next, denote $$Y_q = \{\chi_{p, q} : p \text{ prime, } p|q-1\} \text{ and } Y_s = \bigcup_{\substack{q|s, q \text{ prime}}}Y_q$$ and checks that every $\chi \in Y_s$ satisfies some condition on Gauss sum $\tau (\chi)$. We can reformulate it in terms of Jacobi sums for better performance. Finally we check that every prime $p$ dividing $t$ satisfies:

for every prime divisor $r$ of $n$ there exists $l_p(r) \in \mathbb{Z}_p$ such that $r^{p-1}=(n^{p-1})^{l_p(r)}$ in the group $1 + p\mathbb{Z}_p$.

If this is done, we can prove that for every divisor $r$ of $n$ there exists integer $i < t$ such that $r \equiv n^i \pmod s$.

Stage 3. Conclusion.
Using obtained information we can fully factorize $n$. Practically always at this stage we see that $n$ divisors is only $1$ and $n$.
To factor $n$ completely we only need to find all divisors $r \leq n^{1/2} < s$ of $n$. $r$ congruent to $n^i \pmod s$ for some $i < t$, so define $r_i \equiv n^i \pmod s$ and $0 \leq r_i < s$ for $0 \leq i < t$ we can check which of the $r_i$ divide $n$.

As we see our advantage to take smaller values $t$ and $s$.
In next post I will describe the implementation of algorithm proposed in "Primality Testing and Jacobi Sums" by Cohen and Lenstra.

TestPost

Test $\TeX$ $2+1$
$$2^3$$