First download gmp-5.0.1, uncompress it, and go in the gmp-5.0.1 directory.
$ patch -p 0 -i /tmp/fft_patch.gmp-5.0.1 (assuming the patch is in /tmp)
patching file mpn/generic/mul_fft.c
patching file gmp-h.in
patching file tfft.c
$ ./configure; make
$ gcc -I. tfft.c .libs/libgmp.a
$ ./a.out 1000 1000 1500 # performs a product of 1000 limbs by 1000 limbs,
# with a result mod B^r+1 with r>=1500
# and B=2^BITS_PER_MP_LIMB
r=1536 # prints the actual value of r
This patch adds the following interface to GMP (it does not change anything
in the existing interface):
A new structure "mp_fft_t" is defined:
typedef struct
{
mp_size_t n; /* multiplication is performed mod 2^(n*GMP_NUMB_BITS)+1 */
unsigned long k; /* fft length is K = 2^k */
unsigned long l; /* allow to accumulate up to 2^l products */
unsigned long M;
mp_size_t nprime;
mp_ptr *d; /* array of K arrays of nprime+1 limbs */
} __mp_fft_struct;
typedef __mp_fft_struct mp_fft_t[1];
Seven new functions are exported (their interface is preliminary and may
change, please send any comments to zimmerma at loria dot fr):
mp_size_t mpn_fft_init (mp_fft_t r, mp_size_t n0, unsigned long l)
Initializes a FFT structure r, to perform multiplication
modulo 2^(n*GMP_NUMB_BITS)+1, with n >= n0. Return the value of n.
The 3rd argument indicates that up to 2^l products may be accumulated
in the Fourier domain; in usual cases l = 0.
void mpn_fft_transform (mp_fft_t r, mp_ptr ap, mp_size_t an)
Puts in the FFT structure r the transform of {ap, an}.
We should have an <= n, where n is the value returned when initializing r.
void mpn_fft_mult (mp_fft_t r, mp_fft_t s)
Puts in the FFT structure r the term to term product of r and s.
void mpn_fft_add (mp_fft_t r, mp_fft_t s)
Puts in the FFT structure r the term to term sum of r and s.
void mpn_fft_sub (mp_fft_t r, mp_fft_t s)
Puts in the FFT structure r the term to term difference of r and s.
void mpn_fft_itransform (mp_ptr ap, mp_fft_t r)
Puts in {ap, n+1} the inverse transform of the FFT structure r,
where n is the value returned when initializing r.
void mpn_fft_clear (mp_fft_t r)
Frees the memory allocated by the FFT structure r.
Examples:
1) a multiplication call mpn_mul (rp, ap, an, bp, bn) can be replaced by:
mp_fft_t at, bt;
mp_ptr tp;
mp_size_t m;
m = mpn_fft_init (at, an + bn, 0);
mpn_fft_init (bt, an + bn, 0);
mpn_fft_transform (at, ap, an);
mpn_fft_transform (bt, bp, bn);
mpn_fft_mult (at, bt);
tp = malloc ((m + 1) * sizeof (mp_limb_t));
mpn_fft_itransform (tp, at);
MPN_COPY (rp, tp, an + bn);
mpn_fft_clear (at);
mpn_fft_clear (bt);
free (tp);
2) in some cases, we want to reuse several times a transformed representation.
However, mpn_fft_mult() destroys its first argument. Here is a copy
function, which assumes that both r and s were initialized with the same
2nd argument of mpn_fft_init:
/* copy s into r, assumes that r was initialized with mpn_fft_init(r, n),
with the same 'n' than for s */
void
mpn_fft_copy (mp_fft_t r, mp_fft_t s)
{
MPN_COPY (r->d[0], s->d[0], s->K * (s->nprime + 1));
}