The Complexity Of Primality

Huynh Thiet Dung <[email protected]>
Nov 8, 1994

Primality testing and prime factorization are ancient problems that have attracted special attention of mathematicians for centuries. Number theory is traditionally believed to be the purest of all sciences, and within it the hunt for large primes and for factors of large numbers has always seemed particularly remote from applications, even to questions of a number- theoretic nature. However, recent developments in complexity theory and cryptology have altered this state of affairs as computational number theory finds applications in public crypto- systems such as the RSA (Rivest-Shamir-Adleman) scheme. The RSA cryptographic system is based on the assumption that prime factorization and computing the discrete logarithm are computa- tionally intractable. This brief summary concerns the primality problem only.

PRIMALITY is the problem of determining whether a given integer P (written in binary) is prime, and its complement is denoted by COMPOSITENESS which is to determine whether P is a composite number. Although the two problems are strongly related in that a deterministic algorithm for one problem also yields a deterministic algorithm for the other, it makes sense to consider both of them in the framework of complexity theory.

What is the complexity of PRIMALITY? As most of us know, existing algorithms become very inefficient when P is large. Indeed, if we measure their running times in terms of the length n of the binary representation of P (i.e. n = log P), then all these methods require more than a polynomial number of steps. Such computations are said to be infeasible. Therefore, the question of whether PRIMALITY can be solved in polynomial time has been an important open problem in computational number theory for quite some time. As a matter of fact, the exact complexity of PRIMALITY has been posed as an open problem since the advent of computational complexity theory in the late 60s.

Following are the most outstanding known results concerning the complexity of PRIMALITY:

(1) Assuming that the Extended Riemann's Hypothesis holds true, PRIMALITY can be solved in polynomial time (1976). The Riemann's Hypothesis is a number-theoretic conjecture concerning the roots of the Riemann zeta function. It postulates that the real parts of the zeros of the zeta function zeta(s) = 1 + (1/2^s) + (1/3^s) + ... are exactly 1/2. It's worth noting that this hypothesis is item #8 in Hilbert's 1900 lecture on the most important open problems in mathematics.

(2) There is a fast Monte Carlo algorithm for PRIMALITY (Solovay & Strassen, 1972) as well as a fast Monte Carlo algorithm for COMPO- SITENESS (1987). These two algorithms can be combined to obtain a Las Vegas algorithm for PRIMALITY (or COMPOSITENESS).

It seems appropiate to have a brief explaination of what Monte Carlo and Las Vegas algorithms are. Monte Carlo algorithms are algorithms that may produce wrong answers, but they err with a small probability. The error probability can be made arbitrarily small when we repeat the algorithm for sufficiently many times.

Solovay & Strassen's paper (1972) is the pioneering work that stirred much interest in research on randomized computations. The idea is truly remarkable. After all, computation itself is inherently randomized. The probability of computer failure is nonzero no matter how small it is. So if a randomized algorithm has an extremely small error probability, then it's quite useful indeed. Besides randomized algorithms may be very helpful in the design of deterministic algorithms.

A Monte Carlo (randomized) algorithm for COMPOSITENESS essentially has the following form. It has in the input the n-bit integer P to be tested and a sequence R of n random bits generated from some random source. Depending on R, the algorithm performs computation on the integer P and outputs "Yes" if R provides a witness for the compositeness of P. If P is prime, then no R will ever produce a witness and the algorithm always outputs "No". However, it P is composite, then some R may not produce any witness at all, and the outputs "No" of the algorithm is false in this case.

The important fact is that if P is composite, then a majority of the n-bit sequences R will provide witness for the compositeness of P. So the error probability is less than 1/2, and it can be made arbitrarily small by running the algorithm a number of times. Thus, if this Monte Carlo algorithm outputs "Yes", then this is a definite answer as P is indeed composite. One should note the asymmetry of Monte Carlo algorithms: They err on one side only. This is one of the many reasons why we need to consider COMPOSITENESS and PRIMALITY separately.

Now, suppose that A1 is a Monte Carlo algorithm for COMPOSITENESS and A2 is another one for PRIMALITY. Then we can run both algorithms repeatedly and wait for a definite answer ("Yes, P is composite") from A1 or "Yes, P is prime" from A2. When we get an answer, it is definite and we are sure about P. The problem here is that we may not obtain a definite answer after k runs of both algorithms, but this is very unlikely if k is sufficiently large. This combined algorithm is called a Las Vegas algorithm: if it outputs an answer, the answer is always correct; and the probability that it will eventually output an answer can be made arbitrarily high by letting k be large enough.

As far as prime factorization is concerned, not much is known except the fact that it can be solved in polynomial time by NONDETERMINISTIC algorithms. To the best of my knowledge, there are currently no randomized algorithms for prime factorization. And the RSA scheme is based on the assumption that such algorithms do not exist. Whether this assumption is a legitimate one or not is certainly debatable. In any case, this problem has resisted numerous attempts of mathematicians over the centuries.