[2020] [2019]
[2018] [2017]
[2016] [2015]
[2014] [2013]
[2012] [2011]
[2010] [2009]
[2008] [2007]
[2006] [2005]
[2004] [2003]
[2002] [2001]
[2000] [1999]
[1998] [1997]
[1996]
[1994] [1993]
[1991]
Note: the years below correspond to the first preprint, not to the final
publication dates.
2020:
Accuracy of Mathematical Functions in Single Precision, note, 2 pages, February 2020.

For single precision (IEEE754 binary32), we give the largest error in ulps
(units in last place) of the mathematical function the GNU libc 2.3.1,
released on February 1st, 2020.
Those values were obtained by exhaustive search over the full binary32 range.
2019:
Recovering Hidden SNFS Polynomials, note, 2 pages,
October 2019.

Given an integer N constructed with an SNFS trapdoor, i.e., such that N = Res(f,g) with f = a_{d}x^{d} + a_{d1}x^{d1} + ... + a_{0} having small coefficients a_{i} = O(B), and g = lx  m, we can recover f and g in O(B F(l)) arithmetic operations, assuming B^{2} l^{2} ≪ a_{d} m, where F(l) is the number of arithmetic operations to find a prime factor l. This partially answers an open problem from [1].
A New Ranking Function for Polynomial Selection in the Number Field Sieve, with Nicolas David, preprint, 8 pages, June 2019. To appear in "75 Years of Mathematics of Computation",
Susanne C. Brenner, Igor Shparlinski, ChiWang Shu,
Daniel B. Szyld, eds.,
Contemporary Mathematics, volume 754,
American Mathematical Society.

This article explains why the classical MurphyE ranking function might fail to correctly rank polynomial pairs in the Number Field Sieve, and proposes a new ranking function.
Parallel Structured Gaussian Elimination for the Number Field Sieve, with Charles Bouillaguet, preprint, 11 pages, April 2019.

This article describes a parallel algorithm for the Structured Gaussian Elimination step of the Number Field Sieve (NFS). NFS is the
best known method for factoring large integers and computing
discrete logarithms.
Stateoftheart algorithms for this kind of partial sparse elimination, as implemented in the CADONFS software tool, were
unamenable to parallel implementations. We therefore designed a
new algorithm from scratch with this objective and implemented it
using OpenMP. The result is not only faster sequentially, but scales
reasonably well: using 32 cores, the time needed to process two
landmark instances went down from 38 minutes to 20 seconds and
from 6.7 hours to 2.3 minutes, respectively.
Imperfect Forward Secrecy: How DiffieHellman Fails in Practice, with David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green, J. Alex Halderman, Nadia Heninger, Drew Springall, Emmanuel Thomé, Luke Valenta, Benjamin VanderSloot, Eric Wustrow, and Santiago ZanellaBéguelin, Research Highlights of Communications of the ACM,
volume 62, number 1, pages 106114, January 2019.
[CACM page]
[HAL]

We investigate the security of DiffieHellman key exchange
as used in popular Internet protocols and find it to be less
secure than widely believed. First, we present Logjam, a
novel flaw in TLS that lets a maninthemiddle downgrade
connections to "exportgrade" DiffieHellman. To carry out
this attack, we implement the number field sieve discrete
logarithm algorithm. After a weeklong precomputation for
a specified 512bit group, we can compute arbitrary discrete
logarithms in that group in about a minute. We find that
82% of vulnerable servers use a single 512bit group, and
that 8.4% of Alexa Top Million HTTPS sites are vulnerable
to the attack.a In response, major browsers have changed to
reject short groups.
We go on to consider DiffieHellman with 768 and 1024
bit groups. We estimate that even in the 1024bit case, the
computations are plausible given nationstate resources.
A small number of fixed or standardized groups are used by
millions of servers; performing precomputation for a single
1024bit group would allow passive eavesdropping on 18%
of popular HTTPS sites, and a second group would allow
decryption of traffic to 66% of IPsec VPNs and 26% of SSH
servers. A close reading of published NSA leaks shows that
the agency's attacks on VPNs are consistent with having
achieved such a break. We conclude that moving to stronger
key exchange methods should be a priority for the Internet
community.
2018:
On various ways to split a floatingpoint number,
with ClaudePierre Jeannerod and JeanMichel Muller, proceedings of
ARITH'25, June 2018.
[HAL]
Computational
Mathematics with SageMath, with Alexandre Casamayou, Nathann Cohen,
Guillaume Connan, Thierry Dumont, Laurent Fousse, François Maltey, Matthias Meulien,
Marc Mezzarobba, Clément Pernet, Nicolas M. Thiéry, Erik Bray, John Cremona,
Marcelo Forets, Alexandru Ghitza, Hugh Thomas, SIAM textbook, 2018.
[HAL]
 This is the english translation of the book "Calcul mathématique avec
Sage" which was published in 2013.
The examples were updated from Sage 5.9 to Sage 8.3.
2017:
FFT extension for algebraicgroup factorization algorithms, with Richard P. Brent and Alexander Kruppa, chapter of the book
Topics in Computational
Number Theory Inspired by Peter L. Montgomery, Cambridge University Press,
2017
[HAL].
Other chapters of the book are available online on
this page.
Optimized Binary64 and Binary128 Arithmetic with GNU MPFR, with Vincent Lefèvre, proceedings of the
24th IEEE Symposium on Computer
Arithmetic (ARITH 24), London, UK, July 2426, 2017
[HAL].

We describe algorithms used to optimize the GNU MPFR library when the operands fit into one or two words.
On modern processors, a correctly rounded
addition of two quadruple precision numbers is now performed in 22 cycles,
a subtraction in 24 cycles, a multiplication in 32 cycles,
a division in 64 cycles, and a square root in 69 cycles.
We also introduce a new faithful rounding mode, which enables even
faster computations.
Those optimizations will be available in version 4 of MPFR.
2016:
Computing the ρ constant, with Jérémie Detrey and
PierreJean Spaenlehauer, preprint, 3 pages, October 2016.
Factorisation of RSA220 with CADONFS, with Shi Bai,
Pierrick Gaudry, Alexander Kruppa and Emmanuel Thomé, 3 pages, May 2016.
[HAL].
RSA220 is part of the
RSA Factoring
Challenge.

We report on the factorization of
RSA220 (220 decimal digits), which
is the 3rd largest integer factorization with the
General Number Field Sieve (GNFS), after the factorization of RSA768
(232 digits) in December 2009, and that of
3^{697}+1 (221 digits) in February 2015 by NFS@home.
Twelve New Primitive Binary Trinomials,
with Richard P. Brent, preprint, 2 pages, 2016
[arxiv,
HAL]

We exhibit twelve new primitive
trinomials over GF(2)
of record degrees 42,643,801, 43,112,609,
and 74,207,281.
In addition we report the first Mersenne exponent not ruled
out by Swan's theorem  namely 57,885,161 
for which none primitive trinomial exists.
This completes the search for the currently known Mersenne prime exponents.
2015:
Imperfect Forward Secrecy: How DiffieHellman Fails in Practice, with David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green, J. Alex Halderman, Nadia Heninger, Drew Springall, Emmanuel Thomé, Luke Valenta, Benjamin VanderSloot, Eric Wustrow, Santiago ZanellaBeguelin, May 2015, to appear in
the proceedings of CCS 2015.
[HAL]
[talk slides].
Best paper award.
 We investigate the security of DiffieHellman key exchange as
used in popular Internet protocols and find it to be less secure
than widely believed. First, we present a novel flaw in TLS
that allows a maninthemiddle to downgrade connections
to exportgrade DiffieHellman. To carry out this attack,
we implement the number field sieve discrete log algorithm.
After a weeklong precomputation for a specified 512bit
group, we can compute arbitrary discrete logs in this group
in minutes. We find that 82% of vulnerable servers use a
single 512bit group, allowing us to compromise connections
to 7% of Alexa Top Million HTTPS sites. In response, major
browsers are being changed to reject short groups.
More details here.
See also the CNRS press releases in
english and
french.
Note added on June 21st, 2016: our estimation for DH768 in Table 2 was
pessimistic, since Thorsten Kleinjung did
such a computation with
4000 coreyears of sieving (instead of 8000), 900 core years for linear
algebra (instead of 28,500) for a matrix of 24M rows (instead of 150M).
The DH1024 estimation is probably pessimistic too.
Automatic Analysis, 5 pages. This is a preliminary
version of a chapter for a book about collected works of Philippe Flajolet.
I wrote that chapter in March to July 2012. While the book is not yet
published, I make this text available here.
Magic Squares of Squares, with Paul Pierrat and
François Thiriet, 2015.

We give modular properties and new classes of potential solutions for
magic squares of squares and similar problems.
2014:
Beyond Double Precision, research
project submitted as Advanced Grant
to the European Research
Council, October 2014.
This project was judged ``of high quality but not sufficient to
pass to Step 2 of the evaluation''.
Better Polynomials for GNFS,
with Shi Bai, Cyril Bouvier, and Alexander Kruppa,
Mathematics of Computation, volume 85, pages 861873, 2016
(preprint from September 2014).
[HAL]

The general number field sieve (GNFS) is the most efficient algorithm known
for factoring large integers. It consists of several stages, the first
one being polynomial selection. The quality of the selected polynomials can
be modelled in terms of size and root properties. We propose a new kind of
polynomials for GNFS: with a new degree of freedom, we further improve the
size property. We demonstrate the efficiency of our algorithm by exhibiting a
better polynomial than the one used for the factorization of RSA768, and a
polynomial for RSA1024 that outperforms the best published one.
Note: the sizeoptimized RSA1024 polynomial B_{1024} can be
reproduced using CADONFS using the command sopt n 135...563 f rsa1024.poly1 d 6 with that rsa1024.poly1 file.
2013:
Calcul mathématique avec
Sage [in french] with Alexandre Casamayou, Guillaume Connan, Thierry
Dumont, Laurent Fousse, François Maltey, Matthias Meulien, Marc Mezzarobba,
Clément Pernet and Nicolas M. Thiéry, 2010
[HAL].
DivisionFree BinarytoDecimal Conversion,
with Cyril Bouvier, IEEE Transactions on Computers, volume 63, number 8,
pages 18951901, August 2014.
[HAL]

This article presents algorithms that convert multiple precision
integer or floatingpoint numbers from radix
2 to radix 10 (or to any radix b > 2).
Those algorithms, based on the ``scaled remainder tree'' technique,
use multiplications instead of divisions in their critical part.
Both quadratic and subquadratic algorithms are detailed, with
proofs of correctness.
Experimental results show that our implementation of those algorithms
outperforms the GMP library by up to 50% (using the same lowlevel routines).
Discrete logarithm in GF(2^{809}) with FFS, with Razvan Barbulescu, Cyril Bouvier, Jérémie Detrey, Pierrick Gaudry, Hamza Jeljeli, Emmanuel Thomé and Marion Videau,
proceedings of PKC 2014,
Lecture Notes in Computer Science Volume 8383, pages 221238, 2014.
[HAL]

We give details on solving the discrete logarithm problem in the 202bit
prime order subgroup of GF(2^{809}) using the Function Field
Sieve algorithm (FFS).
2012:
Factorization of
RSA704 with CADONFS, with Shi Bai and Emmanuel Thomé, preprint, July
2012.
[HAL]

We give details of the factorization of RSA704 with CADONFS. This is a
record computation with publicly available software tools.
The aim of
this experiment was to stress CADONFS  which was originally designed
for 512bit factorizations  for larger inputs, and to identify possible
rooms of improvement.
Size Optimization of Sextic Polynomials in the Number
Field Sieve, with Shi Bai, preprint, March 2012, revised June 2013.
[HAL]

The general number field sieve (GNFS) is the most efficient algorithm known for
factoring large integers. It consists of several stages, the first one being
polynomial selection. The quality of the chosen polynomials in polynomial
selection can be modelled in terms of size and root properties. In this paper,
we describe some methods to optimize the size properties of sextic polynomials.
To reproduce the example on page 12, see this Sage file.
Note added January 23, 2015: part of the results of this preprint are given
in Better Polynomials for GNFS (see above).
Avoiding adjustments in modular computations,
preprint, March 2012.

We consider a sequence of operations
(additions, subtractions, multiplications) modulo a fixed
integer N, where only the final value is needed,
therefore intermediate computations might use any representation.
This kind of computation appears for example in number theoretic transforms
(NTT), in stage 1 of the elliptic
curve method for integer factorization,
in modular exponentiation, ...
Our aim is to avoid as much as possible adjustment steps,
which consist in adding or subtracting N, since those steps are useless
in the mathematical sense.
Note added July 17, 2012: the fact of using residues larger than N in
Montgomery multiplication is well known. See for example the article
"Software Implementation of Modular Exponentiation Using Advanced Vector
Instructions Architectures" by Shay Gueron and Vlad Krasnov in the proceedings
of WAIFI 2012, where it is called "Non Reduced Montgomery Multiplication"
(NRMM).
Note added June 26, 2018: for Montgomery multiplication, earlier papers are by
H. Orup ("Simplifying quotient determination in highradix modular multiplication",
Arith 12, 1999) and C. D. Walter ("Montgomery exponentiation needs no final
subtraction", Electronics Letters, 1999).
Finding Optimal Formulae for
Bilinear Maps, with Razvan Barbulescu, Jérémie Detrey and Nicolas Estibals,
proceedings of
WAIFI 2012, Bochum, Germany, July 1619,
LNCS 7369, pages 168186, 2012.
2011:
Maximal Determinants and Saturated Doptimal
Designs of Orders 19 and 37, with Richard P. Brent, William Orrick,
and Judyanne Osborn, 28 pages.

A saturated Doptimal design is a {+1,1} square matrix of given order with maximal determinant. We search for saturated Doptimal designs of orders 19 and 37, and find that known matrices due to Smith, Cohn, Orrick and Solomon are optimal. For order 19 we find all inequivalent saturated Doptimal designs with maximal determinant, 2^{30} * 7^{2} * 17, and confirm that the three known designs comprise a complete set. For order 37 we prove that the maximal determinant is 2^{39} * 3^{36}, and find a sample of inequivalent saturated Doptimal designs. Our method is an extension of that used by Orrick to resolve the previously smallest unknown order of 15; and by Chadjipantelis, Kounias and Moyssiadis to resolve orders 17 and 21. The method is a twostep computation which first searches for candidate Gram matrices and then attempts to decompose them. Using a similar method, we also find the complete spectrum of determinant values for {+1,1} matrices of order 13.
Will Orrick compiles known results about
The Hadamard maximal determinant problem.
Related integer sequences are
A003432 and
A003433.
Note: Richard Brent wrote a paper
showing how to generate many Hadamard
equivalence classes of solutions from a given Gram matrix.
Numerical Approximation of the MasserGramain Constant to
Four Decimal Digits: delta=1.819..., with Guillaume Melquiond and W. Georg
Nowak, Mathematics of Computation, volume 82, number 282, pages 12351246, 2013.
[HAL entry]

We prove that the constant studied by Masser, Gramain, and Weber, satisfies
1.819776 < delta < 1.819833, and disprove a conjecture of Gramain.
This constant is a twodimensional analogue of the EulerMascheroni constant;
it is obtained by computing the radius r_{k} of the smallest disk of
the plane containing k Gaussian integers. While we have used the original
algorithm for smaller values of k, the bounds above come from methods
we developed to obtain guaranteed enclosures for larger values of k.
Ballot stuffing in a postal voting system,
with Véronique Cortier, Jérémie Detrey, Pierrick Gaudry, Frédéric Sur,
Emmanuel Thomé and Mathieu Turuani, proceedings of
REVOTE 2011, International Workshop on
Requirements Engineering for Electronic Voting Systems, Trento, Italy,
August 29, 2011, pages 2736.

We review a postal voting system used in spring 2011 by the French research institute CNRS and designed by a French company (Tagg Informatique).
We explain how the structure of the material can be easily understood out of a few samples of voting material (distributed to the voters), without any prior knowledge of the system.
Taking advantage of some flaws in the design of the system, we show how to perform major ballot stuffing, making possible to change the outcome of the election.
Our attack has been tested and confirmed by the CNRS. A fixed postal voting system has been quickly proposed by Tagg Informatique in collaboration with the CNRS, preventing this attack for the next elections.
Short Division of Long Integers,
with David Harvey, proceedings of the
20th IEEE Symposium on Computer
Arithmetic (ARITH 20), Tuebingen, July 2527, 2011, pages 714.
[HAL entry,
DOI]

We consider the problem of short division  i.e., approximate quotient
 of multipleprecision integers.
We present readytoimplement algorithms that yield an approximation
of the quotient, with tight and rigorous error bounds.
We exhibit speedups of up to 30% with respect to GMP division with remainder,
and up to 10% with respect to GMP short division, with room for further
improvements.
This work enables one to implement fast correctly rounded division routines
in multipleprecision software tools.
Note added July 27, 2011: the algorithm used by GMP 5 (implemented in the
mpn_div_q function) was presented by Torbjörn Granlund in his invited talk
at the ICMS 2006 conference.
2010:
NonLinear
Polynomial Selection for the Number Field
Sieve
[doi],
with Thomas Prest, Journal of Symbolic
Computation, special issue in the honour of Joachim von zur Gathen,
volume 47, number 4, pages 401409, 2012.
[HAL]

We present an algorithm to find two nonlinear polynomials
for the Number Field Sieve integer factorization method.
This algorithm extends Montgomery's "two quadratics" method;
for degree 3, it gives two skewed polynomials with resultant
O(N^{5/4}), which improves on Williams O(N^{4/3}) result.
Note added in June 2011:
Namhun Koo, Gooc Hwa Jo and Soonhak Kwon extended our algorithm in
this preprint.
Note added on September 30, 2011: Nicholas Coxon extended and analysed our
algorithm in
this preprint.
Note added on April 5, 2018: as noticed by Georgina Canny, the polynomials
in the example for Montgomery's two quadratics method should be reversed.
However, the reverse polynomials also work, since they have as common root
1/m modulo N.
Modern Computer Arithmetic,
with Richard Brent, Cambridge University Press, 2010,
our page of
the book,
[HAL entry].

This book collects in the same document all stateoftheart
algorithms in multiple precision arithmetic (integers, integers modulo n,
floatingpoint numbers). The best current reference on that topic is
volume 2 from Knuth's The art of computer programming,
which misses some new important algorithms (divide and conquer division,
other variants of FFT multiplication, floatingpoint algorithms, ...)
Our aim is to give detailed algorithms:
 for all operations (not just multiplication as many text books),
 for all size ranges (not just schoolbook methods or FFTbased methods),
 and including all details (for example how to properly deal with carries
for integer algorithms, or a rigorous analysis of roundoff errors for
floatingpoint algorithms).
The book would be useful for graduate students in computer science and
mathematics (perhaps too specialized for most undergraduates, at least
in its present state), researchers in discrete mathematics, computer
algebra, number theory, cryptography, and developers of multipleprecision
libraries.
Reliable Computing with GNU MPFR,
proceedings of the 3rd
International Congress on Mathematical Software (ICMS 2010), June 2010,
pages 4245, LNCS 6327, Springer.
The original publication is (or will be) available on www.springerlink.com.

This article presents a few applications where reliable computations are
obtained using the GNU MPFR library.
Why and how to use arbitrary precision, with
Kaveh R. Ghazi, Vincent Lefèvre and Philippe Théveny, March 2010,
Computing in Science and Engineering, volume 12, number 3, pages 6265,
2010 (© IEEE).

Most nowadays floatingpoint computations are
done in double precision, i.e., with a significand (or
mantissa) of 53 bits.
However, some applications require more precision:
doubleextended (64 bits or more), quadruple precision (113 bits) or even more.
In an article published in The Astronomical Journal in 2001, Toshio
Fukushima says: In the days of powerful computers,
the errors of numerical integration are the main limitation
in the research of complex dynamical systems,
such as the longterm stability of our solar system
and of some exoplanets [...] and gives an example
where using double precision leads to an accumulated
roundoff error of more than 1 radian for the solar
system! Another example where arbitrary precision
is useful is static analysis of floatingpoint programs
running in electronic control units of aircrafts or in
nuclear reactors.
An O(M(n) log n) algorithm for the
Jacobi symbol, with Richard Brent, January 2010,
Proceedings of the Ninth Algorithmic Number Theory Symposium (ANTSIX),
Nancy, France, July 1923, 2010, LNCS 6197, pages 8395, Springer Verlag
[the original publication is or will be available at www.springerlink.com].

The best known algorithm to compute the Jacobi symbol
of two nbit integers runs in time O(M(n) log n), using Schönhage's fast
continued fraction algorithm combined with an identity due to Gauss. We
give a different O(M(n) log n) algorithm based on the binary recursive
gcd algorithm of Stehlé and Zimmermann. Our implementation  which
to our knowledge is the first to run in time O(M(n) log n)  is faster than
GMP's quadratic implementation for inputs larger than about 10000
decimal digits.
Note: the subquadratic code mentioned in the paper is available
here.
Note added July 19, 2019: Niels Möller published on
arxiv a description of the
algorithm he implemented in GMP in 2010.
Factorization of a 768bit RSA
modulus, with Thorsten Kleinjung, Kazumaro Aoki, Jens Franke, Arjen K.
Lenstra, Emmanuel Thomé,
Joppe W. Bos, Pierrick Gaudry, Alexander Kruppa, Peter L. Montgomery,
Dag Arne Osvik, Herman te Riele and Andrey Timofeev, Proceedings of
Crypto'2010, Santa Barbara, USA, LNCS 6223, pages 333350, 2010
[technical announcement].

This paper reports on the factorization of the 768bit number
RSA768 by the
number field sieve factoring method and discusses some implications for RSA
[more details here]
Note added 21 September 2012: in Section 2.3 (Sieving) we report 47 762 243 404
unique relations (including free relations). It appears the correct number
should be about 10^{9} less, i.e., 46 762 246 508 including free
relations, and 46 705 023 046 without free relations.
Note added 09 October 2012: in Section 2.3 (Sieving) the number of remaining
prime ideals we report during the filtering (initially 35.3G, then 14.5G and
10G) are most probably underestimated by about 5G.
The Great Trinomial Hunt,
with Richard Brent, Notices of the American Mathematical
Society, volume 58, number 2, pages 233239, February 2011.
2009:
The glibc bug #10709, September 2009.
[bugzilla
entry]

On computers without doubleextended precision,
the GNU libc 2.10.1 incorrectly rounds the sine
of (the doubleprecision closest to) 0.2522464. This is a bug in
IBM's Accurate Mathematical Library, which claims correct rounding,
as recommended by IEEE 7542008.
We analyze this bug and propose a fix.
Calcul formel : mode d'emploi.
Exemples en Maple, with Philippe Dumas, Claude Gomez, Bruno Salvy, March 2009 (in french) [HAL entry].

This book is a free version of the book of the same name published by Masson in
1995. The examples use an obsolete version of Maple (V.3), but most of the text
still applies to Maple and other modern computer algebra systems.
Computing predecessor and successor in rounding to nearest,
with Siegfried
Rump, Sylvie Boldo and Guillaume Melquiond, BIT
Numerical Mathematics, volume 49, number 2, pages 419431, 2009.

We give simple and efficient methods to compute and/or estimate the
predecessor and successor of a floatingpoint number using only floatingpoint
operations in rounding to nearest. This may be used to simulate interval
operations, in which case the
quality in terms of the diameter of the result is
significantly improved compared to existing approaches.
Note added January 31, 2018: JeanMichel Muller found a error in Remark 1
following Theorem 2.2, where we say that in the range [1/2,2]*eta/u, Algorithm
1 returns csup = succ(succ(c)). This is wrong. For example, in the IEEE 754
binary32 format, this range is [2^{126},2^{124}]. For
c = 2^{125}  2^{149}, Algorithm 1 returns csup = succ(c).
2008:
Worst Cases for the Exponential Function in the
IEEE 754r decimal64 Format, with Vincent Lefèvre and Damien
Stehlé, LNCS volume 5045, pages 114126,
special LNCS issue
following the Dagstuhl seminar 06021:
Reliable Implementation of Real Number Algorithms: Theory and Practice,
August 2008,

We searched for the worst cases for correct rounding of the exponential
function in the IEEE 754r decimal64 format, and computed all the bad cases
whose distance from a breakpoint (for all rounding modes) is less than
10^{15} ulp, and we give the worst ones. In particular, the worst case
for x ≥ 3 * 10^{11} is exp(9.407822313572878 * 10^{2})
= 1.09864568206633850000000000000000278... This work can be
extended to other elementary functions in the decimal64 format and allows
the design of reasonably fast routines that will evaluate these functions
with correct rounding, at least in some domains.
[Complete lists of worst cases for the exponential are available
for the IEEE 754r
decimal32 and
decimal64
formats.]
Ten New Primitive Binary Trinomials,
with Richard Brent, Mathematics of Computation 78 (2009), pages 11971199
[Brent's web
page].

We exhibit ten new primitive
trinomials over GF(2)
of record degrees 24036583, 25964951, 30402457, and 32582657.
This completes the search for the currently known Mersenne prime exponents.
Implementation of the reciprocal square root in
MPFR, March 2008 (extended abstract),
Dagstuhl Seminar Proceedings following
Dagstuhl seminar 08021 (Numerical validation in current hardware
architectures), January 0611, 2008.

We describe the implementation of the reciprocal square root  also called
inverse square root  as a native function
in the MPFR library. The difficulty is to implement
Newton's iteration for the reciprocal square root on top's of
GNU MP's
mpn layer, while guaranteeing a rigorous 1/2 ulp bound on the
roundoff error.
Landau's function
for one million billions,
with Marc Deléglise and JeanLouis Nicolas,
Journal de Théorie des Nombres de Bordeaux, volume 20, number 3, pages 625671,
2008.
A Maple program implementing the algorithm described in this paper is
available from
JeanLouis Nicolas web page.

Let S_{n} denote the symmetric group
with n letters, and g(n) the maximal order of an element
of S_{n}. If the standard factorization of M into primes
is M=q_{1}^{a1}
q_{2}^{a2} ...
q_{k}^{ak}, we define l(M)
to be q_{1}^{a1} +
q_{2}^{a2} + ... +
q_{k}^{ak};
one century ago, E. Landau proved that g(n)=max_{l(M) ≤ n} M
and that,
when n goes to infinity, log g(n) ~ sqrt(n log(n)).
There exists a basic algorithm to compute g(n) for 1 ≤ n
≤ N; its running time is
O(N^{3/2} / sqrt(log N))
and the needed memory is O(N);
it allows computing g(n) up to, say, one million. We describe an
algorithm to calculate g(n) for n up to 10^{15}.
The main idea is to use
the socalled lsuperchampion numbers. Similar numbers, the
superior highly composite numbers, were introduced by S. Ramanujan to
study large values of the divisor function tau(n)=sum_{d divides n} 1.
Faster Multiplication in
GF(2)[x], with Richard P. Brent, Pierrick Gaudry and Emmanuel Thomé,
Proceedings of the
Eighth Algorithmic Number Theory
Symposium (ANTSVIII), May 1722, 2008, Banff Centre, Banff, Alberta
(Canada), A. J. van der Poorten and A. Stein, editors, pages 153166,
LNCS 5011, 2008.
A preliminary version appeared as INRIA Research Report, November 2007.

In this paper, we discuss an implementation of various algorithms for
multiplying polynomials in GF(2)[x]: variants of the window methods,
Karatsuba's, ToomCook's, Schönhage's and Cantor's algorithms. For most
of them, we propose improvements that lead to practical speedups.
The code that we developed for this paper is contained in the gf2x
package, available under the GNU General Public License from
http://wwwmaths.anu.edu.au/~brent/gf2x.html.
Note (added 20 May 2008): part of this paper will be soon obsolete,
namely the base case section, with the
PCLMULQDQ
instruction from
Intel's new AVX instructionset
(compiler intrinsics,
simulator,
compiler support).
Note (added 26 Jan 2009): Gao and Mateer have found a theoretical
speedup in Cantor
multiplication, see http://cr.yp.to/f2mult.html.
Note (added 29 Nov 2011): Su and Fan analyze the use of PCLMULQDQ in
http://eprint.iacr.org/2011/589.
2007:
A Multilevel Blocking Distinct Degree Factorization
Algorithm, INRIA Research Report 6331,
with Richard P. Brent, 16 pages, October 2007.
This paper describes in detail the algorithm presented at the
8th International
Conference on Finite Fields and Applications (Fq8),
July 913, 2007, Melbourne, Australia
[extended abstract],
[Richard's slides].
A revised version appeared in a special issue of Contemporary Mathematics,
volume 461, pages 4758, 2008.

We give a new algorithm for performing the distinctdegree factorization of
a polynomial P(x) over GF(2), using a multilevel
blocking strategy. The coarsest level of blocking replaces GCD computations
by multiplications, as suggested by Pollard (1975), von zur Gathen and Shoup
(1992), and others.
The novelty of our approach is that
a finer level of blocking replaces multiplications by squarings,
which speeds up the computation
in GF(2)[x]/P(x) of certain interval polynomials
when P(x) is sparse.
As an application we give a fast
algorithm to search for all irreducible trinomials
x^{r} + x^{s} + 1 of degree
r over GF(2), while producing a certificate that can be checked
in less time than the full search.
Naive algorithms cost O(r^{2}) per trinomial, thus O(r^{3}) to search
over all trinomials of given degree r.
Under a plausible assumption about the
distribution of factors of trinomials, the new algorithm has
complexity O(r^{2} (log r)^{3/2} (log log r)^{1/2})
for the search over all trinomials of degree r.
Our implementation achieves a speedup of greater than a factor
of 560 over the naive algorithm
in the case r = 24036583 (a Mersenne exponent).
Using our program, we have found two new primitive trinomials
of degree 24036583 over GF(2) (the previous record degree was 6972593).
A GMPbased implementation of
SchönhageStrassen's large integer multiplication algorithm,
with Pierrick Gaudry and Alexander Kruppa,
Proceedings of the
International Symposium on
Symbolic and Algebraic Computation (ISSAC 2007),
Waterloo, Ontario, Canada, pages 167174,
editor C.W.Brown, 2007.

SchönhageStrassen's algorithm is one of the best known algorithms for
multiplying large integers. Implementing it efficiently is of utmost
importance, since
many other algorithms rely on it as a subroutine. We present here an improved
implementation, based on the one distributed within the GMP library.
The following ideas and
techniques were used or tried: faster arithmetic modulo 2^{n}+1,
improved cache
locality, Mersenne transforms, Chinese Remainder Reconstruction, the
sqrt(2) trick, Harley's and Granlund's tricks, improved tuning. We
also discuss some ideas we plan to try in the future.
Note: this paper was motivated by
Allan
Steel, and the corresponding code is available from
http://www.loria.fr/~zimmerma/software/.
Andrew Sutherland used our FFT code to set a new record for
elliptic curve point
counting.
Note added on April 29, 2011: TszWo Sze multiplied integers of 2^{40}
bits in about 2000 seconds using a cluster of 1350 cores :
preprint.
Time and SpaceEfficient
Evaluation of Some Hypergeometric Constants, with Howard Cheng,
Guillaume Hanrot, Emmanuel Thomé and Eugene Zima,
Proceedings of the International Symposium on
Symbolic and Algebraic Computation (ISSAC 2007),
Waterloo, Ontario, Canada, pages 8591,
editor C.W.Brown, 2007.

The currently best known algorithms for the numerical evaluation of
hypergeometric constants such as zeta(3) to d decimal digits have time
complexity O(M(d) log^{2} d) and space complexity of O(d log d) or
O(d). Following work from Cheng, Gergel, Kim and Zima, we present a new
algorithm with the same asymptotic complexity, but more efficient in practice.
Our implementation of this algorithm improves over existing programs
for the computation of Pi, and we announce a new record of 2 billion digits
for zeta(3).
Worst Cases of a Periodic Function for Large Arguments,
with Guillaume Hanrot,
Vincent Lefèvre and Damien Stehlé,
Proceedings of the 18th IEEE Symposium on Computer Arithmetic
(ARITH'18), pages 133140,
Montpellier, France, 2007.
A preliminary version appeared as
INRIA Research Report 6106,
January 2007.

One considers the problem of finding hard to round cases of a periodic
function for large floatingpoint inputs, more precisely when the function
cannot be efficiently approximated by a polynomial.
This is one of the last few issues that prevents from guaranteeing an
efficient computation of correctly rounded transcendentals for the whole
IEEE754 double precision format.
The first nonnaive algorithm for that problem is presented, with an heuristic
complexity of O(2^{0.676 p}) for a precision of p bits.
The efficiency of the algorithm is shown on the largest IEEE754 double
precision binade for the sine function, and some corresponding bad cases are
given.
We can hope that all the worst cases of the trigonometric functions
in their whole domain will be found within a few years, a task that
was considered out of reach until now.
2006:
Asymptotically Fast Division for GMP,
October 2005, revised August 2006, October 2006 and February 2015.

Until version 4.2.1, GNU MP
(GMP for short) division has complexity
O(M(n) log(n)), which is not asymptotically optimal.
We propose here some division algorithms that achieve O(M(n)) with small
constants, with corresponding GMP code.
Code is available too: invert.c computes an approximate
inverse within 1 ulp in 3M(n) [revised in February and May 2015].
A patch for the 2limb case was proposed by
Marco
Bodrato: invert.diff.
Errors Bounds on Complex FloatingPoint Multiplication, with Richard Brent
and Colin Percival,
Mathematics
of Computation volume 76 (2007), pages 14691481.
Some technical details are given
in
INRIA Research Report 6068, December 2006.
[DOI]

Given floatingpoint arithmetic with tdigit baseβ significands
in which all arithmetic operations are performed as if calculated to infinite
precision and rounded to a nearest representable value, we prove that
the product of complex values z_{0} and z_{1}
can be computed with
maximum absolute error z_{0} z_{1} (1/2)
β^{1t}
sqrt(5). In particular, this provides relative error bounds of 2^{24}
sqrt(5) and 2^{53} sqrt(5) for IEEE 754 single and double precision
arithmetic respectively, provided that overflow, underflow, and
denormals do not occur.
We also provide the numerical worst cases for IEEE 754 single and double
precision arithmetic.
20 years of ECM (©
SpringerVerlag),
with Bruce Dodson, Proceedings of
ANTS VII, July 2006.
A preliminary version appeared as
INRIA Research Report 5834,
February 2006.

The Elliptic Curve Method for integer factorization (ECM) was invented by
H. W. Lenstra, Jr., in 1985 [Lenstra87].
In the past 20 years, many improvements of ECM were proposed
on the mathematical, algorithmic, and implementation sides.
This paper summarizes the current stateoftheart,
as implemented in the GMPECM software.
Erratum: on page 541 we write ``Computer experiments indicate that these curves
have, on average, 3.49 powers of 2 and 0.78 powers of 3, while Suyama's family
has 3.46 powers of 2 and 1.45 powers of 3''. As noticed by Romain Cosset,
those experiments done by the
first author are wrong, since he ran 1000 random curves with the same
prime input p=10^{10}+19, and the results differ according on the
congruence of p mod powers of 2 and 3. With 10000 random curves on the 10000
primes just above 10^{20}, we get an average of
2^{3.34}*3^{1.68} = 63.9 for Suyama's family, and
2^{3.36}*3^{0.67} = 21.5 for curves of the form
(16d + 18) y^{2} = x^{3} + (4d + 2)x^{2} + x.
2005:
MPFR:
A MultiplePrecision Binary FloatingPoint
Library With Correct Rounding, with Laurent Fousse, Guillaume Hanrot,
Vincent Lefèvre, Patrick Pélissier,
INRIA Research Report RR5753, November 2005.
A revised version appeared in ACM TOMS (Transactions on Mathematical
Software), volume 33, number 2, article 13, 2007.

This paper presents a multipleprecision binary
floatingpoint library,
written in the ISO C language, and based on the GNU MP library.
Its particularity is to extend ideas from the IEEE754 standard
to arbitrary precision, by providing correct rounding
and exceptions.
We demonstrate how these strong semantics are achieved 
with no significant slowdown with respect to other tools 
and discuss a few applications where such a library can be useful.
Techniques algorithmiques et méthodes de
programmation (in french), 11 pages, July 2005, appeared in
Encyclopédie de
l'informatique et des systèmes
d'information, pages 929935, Vuibert, 2006.
5,341,321, June 2005.

This short note shows the nasty effects of patents for the development of free
software, even for patents that were not written with software applications
in mind.
The Elliptic Curve Method,
November 2002, revised April 2003 and September
2010,
appeared in the
Encyclopedia of Cryptography and Security, Springer, 2005
(old link).

Describes in two pages the history of ECM, how it works at high level,
improvements to the method, and some applications.
MPFR : vers un calcul flottant correct ? (in french),
Interstices, 2005.

Obtenir un seul résultat pour un calcul donné : à
première vue, cela semble une évidence ; c'est en fait un vaste
sujet de recherche auquel les chercheurs apportent petit à petit leurs
contributions. Une nouvelle étape est franchie aujourd'hui grâce
à MPFR, une bibliothèque de
calcul multiprécision sur les nombres flottants.
A primitive trinomial of degree 6972593,
with Richard Brent and Samuli Larvala,
Mathematics of Computation, volume 74, number 250, pages 10011002, 2005.

The only primitive trinomials of degree 6972593 over GF(2)
are x^{6972593} + x^{3037958} + 1 and its reciprocal.
An elementary digital plane recognition algorithm,
with Yan Gerard and Isabelle DebledRennesson,
appeared in Discrete Applied Mathematics, volume 151, issue 13,
pages 169183, 2005.

A naive digital plane is a subset of
points (x,y,z) in Z^{3} verifying
h ≤ ax+by+cz < h+max{  a,b, c } where
(a,b,c,h) in Z^{4}. Given a finite unstructured subset
of Z^{3}, determine whether there
exists a naive digital plane containing it is called
digital plane recognition. This question is
rather classical in the field of digital geometry (also called
discrete geometry). We suggest in this paper a new algorithm to
solve it. Its asymptotic complexity
is bounded by O(n^{7}) but its behavior seems to be linear in
practice. It uses an original strategy of optimization in a set of
triangular facets (triangles). The code is short and elementary
(less than 300 lines) and available on
http://www.loria.fr/~debled/plane and here.
Searching Worst Cases of a OneVariable Function Using Lattice Reduction,
with Damien Stehlé and Vincent Lefèvre,
IEEE Transactions on Computers,
volume 54, number 3, pages 340346, 2005.
A preliminary version appeared as
INRIA Research Report 4586.
Some results for the 2^{x} function doubleextended precision
are available
here.

We propose a new algorithm to find worst cases for the correct rounding of a
mathematical function of one variable. We first reduce this problem to the
real small value problemi.e., for polynomials with real coefficients. Then,
we show that this second problem can be solved efficiently by extending
Coppersmith's work on the integer small value problemfor polynomials with
integer coefficientsusing lattice reduction. For floatingpoint numbers with
a mantissa less than N and a polynomial approximation of degree d, our
algorithm finds all worst cases at distance less than N^{d^2/(2d+1)}
from a machine number in time O(N^{(d+1)/(2d+1)+epsilon}). For d=2, a
detailed study improves on the O(N^{2/3+epsilon}) complexity from
Lefèvre's algorithm to O(N^{4/7+epsilon}). For larger d, our
algorithm can be used to check that there exist no worst cases at distance
less than N^{k} in time O(N^{1/2+epsilon}).
Note added on February 10, 2020: in the IEEE TC version,
page 344, Section 5,1, line 6 of the matrix: in column 1, the coefficient
should read Ca instead of C;
the coefficient Cb should be in the second column, and
the coefficient Cc should be in the third column.
2004:
Gal's Accurate Tables Method Revisited,
with Damien Stehlé, INRIA Research Report RR5359, October 2004.
An improved version
appeared in the Proceedings of
Arith'17.
Those ideas are demonstrated by an implementation of the
exp2 function
in double precision.
Erratum in the final version of the paper: in Section 4, the
simultaneous worst case for sin and cos is t0=1f09c0c6cde5e3
and not t0=31a93fddd45e3.
See also
my coauthor page.

Gal's accurate tables algorithm aims at providing an efficient implementation of
elementary functions with correct rounding as often as possible. This method
requires an expensive precomputation of a table made of the values taken by the
function or by several related functions at some distinguished points. Our
improvements of Gal's method are twofold: on the one hand we describe what is
the arguably best set of distinguished values and how it improves the efficiency
and correctness of the implementation of the function, and on the other hand we
give an algorithm which drastically decreases the cost of the precomputation.
These improvements are related to the worst cases for the correct rounding of
mathematical functions and to the algorithms for finding them. We show that the
whole method can be turned into practice by giving complete tables for
2^{x} and sin(x) for x in [1/2,1[, in double precision.
Newton iteration revisited,
with Guillaume Hanrot, March 2004.

On March 10, 2004, Dan Bernstein announced a revised draft of his paper
Removing redundancy in highprecision Newton iteration,
with algorithms that compute a reciprocal
of order n over C[[x]] 1.5+o(1) times longer than a product;
a quotient or logarithm 2.16666...+o(1) times longer;
a square root 1.83333...+o(1) times longer;
an exponential 2.83333...+o(1) times longer.
We give better algorithms.
Note added on March 24, 2004: the 1.5+o(1) reciprocal algorithm was already
published by Schönhage (Information Processing Letters 74, 2000, p. 4146)
Note added on July 24, 2006: in a preprint
Newton's method and FFT trading, Joris van der Hoeven announces
better constants for the exponential (2.333...) and the quotient (1.666...).
[October 27, 2009: those constants need yet to be confirmed.]
Note added on April 20, 2009: as noticed by David Harvey, in Section 3,
in the Divide algorithm, Step 4 should read
q < q_{0} + g_{0} (h_{1}  ε) x^{n}, where
h = h_{0} + x^{n} h_{1}.
Indeed, after Step 3 we have q_{0} f = h_{0} +
ε x^{n} + O(x^{2n}),
i.e., q_{0} = h_{0}/f + ε/f x^{n} + O(x^{2n}).
Thus h/f = q_{0} + (h_{1}  ε)/f x^{n}
+ O(x^{2n}) = q_{0} + g_{0} (h_{1}  ε)
x^{n} + O(x^{2n}).
Note added on September 11, 2009: as noticed by David Harvey, the
1.91666...M(n) cost for the square root (Section 4) is incorrect;
it should be 1.8333...M(n) instead. Indeed, Step 3 of Algorithm SquareRoot
costs only M(n)/3 to compute f_{0}^{2} mod x^{n}1
instead of M(n)/2, since we only need two FFT transforms of size n.
David Harvey has improved the reciprocal to 1.444...M(n) and the square root
to 1.333...M(n) [preprint].
Arithmétique
flottante,
with Vincent Lefèvre, INRIA Research Report RR5105,
February 2004 (in french).

Ce document rassemble des notes d'un cours donné en 2003 dans
la filière Algorithmique Numérique et
Symbolique du DEA d'Informatique de l'Université Henri Poincaré
Nancy 1.
Ces notes sont basées en grande partie sur le livre
Elementary Functions. Algorithms and Implementation de JeanMichel
Muller.
Aussi disponible sur LibreCours.
A Formal Proof of Demmel and Hida's Accurate Summation
Algorithm, with Laurent Fousse, January 2004.

A new proof of the ``accurate summation'' algorithm proposed by Demmel and
Hida is presented.
The main part of that proof
has been written in the Coq language and verified by the Coq proof assistant.
The Middle Product Algorithm,
I. Speeding up the division and square root of
power series, with Guillaume Hanrot and Michel Quercia, AAECC, volume 14,
number 6, pages 415438, 2004.
A preliminary version appeared as
INRIA Resarch Report 3973.

We present new algorithms for the inverse, division, and
square root of power series.
The key trick is a new algorithm  MiddleProduct or, for
short, MP  computing the n middle coefficients of a
(2n1) * n full product in the same number of multiplications
as a full n * n product.
This improves previous work of Brent, Mulders, Karp and Markstein,
Burnikel and Ziegler.
These results apply both to series and polynomials.
Note added June 10, 2009: Part II of this work was planned to deal with
integer entries, but David Harvey was faster than us, see
The Karatsuba middle product for integers.
Proposal for a Standardization of Mathematical Function Implementation in FloatingPoint Arithmetic, with David Defour,
Guillaume Hanrot, Vincent Lefèvre, JeanMichel Muller and
Nathalie Revol, January 2003, Numerical Algorithms, volume 37, number 14,
pages 367375, 2004.
Extended version appeared as
INRIA Research Report RR5406.

Some aspects of what a standard for the implementation of the elementary
functions could be are presented. Firstly, the need for such a standard is
motivated. Then the proposed standard is given. The question of roundings
constitutes an important part of this paper: three levels are proposed,
ranging from a level relatively easy to attain (with fixed maximal relative
error) up to the best quality one, with correct rounding on the whole range
of every function.
We do not claim that we always suggest the right choices, or that we have
thought about all relevant issues. The mere goal of this paper is to raise
questions and to launch the discussion towards a standard.
A long note on Mulders'
short product, with Guillaume Hanrot, INRIA Research Report RR4654,
November 2002. A revised version appeared in the
Journal of Symbolic Computation, volume 37, pages 391401, 2004.
A corrigendum appeared in 2014 (pdf).

The short product of two power series is
the meaningful part of the product of these objects,
i.e., sum(a[i] b[j] x^{i+j}, i+j < n). In [Mulders00], Mulders gives an
algorithm to compute a short product faster than the full product in the
case of Karatsuba's multiplication [KaOf62].
This algorithm works by selecting a
cutoff point k and performing a full k x k product and two
(nk) x (nk) short products recursively. Mulders also gives a
heuristically optimal cutoff point beta n. In this paper, we determine
the optimal cutoff point in Mulders' algorithm. We also give a slightly
more general description of Mulders' method.
Note added November 24, 2011: Murat Cenk and Ferruh Ozbudak published a paper
"Multiplication of polynomials modulo x^{n}" in Theoretical Computer
Science, vol. 412, pages 34513462, 2011. The main result of this paper
(Theorem 3.1) is a direct consequence of Mulders' short product, which is not
cited in this paper. Blame to the authors, to the anonymous reviewers and to
the editor in charge, Victor Y. Pan.
Moreover the results in Table 1 of this paper are not optimal:
M̂(14) <= M(10) + 2*M̂(4) <= 39+2*8 = 55 (instead of 56);
M̂(16) <= M(12) + 2*M̂(4) <= 51+2*8 = 67 (instead of 70);
M̂(17) <= M(12) + 2*M̂(5) <= 51+2*11 = 73 (instead of 76);
M̂(18) <= M(12) + 2*M̂(6) <= 51+2*15 = 81 (instead of 85).
Note added February 10, 2014: Karim Belabas pointed out an error in Algorithm
ShortProduct (and in Theorem 2). Indeed, the result is not necessarily reduced
modulo x^{n}, since for example with f=2*x+1 and g=3*x+4 with n=2,
we have l=1*4=4, h=2*3=6, m=(1+2)*(4+3)46=11, thus the result is
4 + 11*x + 6*x^{2}. To fix this, it suffices to zero out the
coefficient of x^{n} in the final result.
2003:
Random number generators with period divisible by a Mersenne prime,
with Richard Brent,
Proceedings of Computational Science and its Applications (ICCSA),
LNCS 2667, pages 110, 2003.
[HAL]

Pseudorandom numbers with long periods and good statistical properties are often required for applications in computational finance. We consider the requirements for good uniform random number generators, and describe a class of generators whose period is a Mersenne prime or a small multiple of a Mersenne prime. These generators are based on "almost primitive" trinomials, that is trinomials having a large primitive factor. They have very fast vector/parallel implementations and excellent statistical properties.
A Binary Recursive Gcd Algorithm, with Damien Stehlé,
INRIA Research Report RR5050, December 2003.
A revised version (© SpringerVerlag)
is published in the Proceedings of the
Algorithmic Number Theory Symposium (ANTS VI).
[Damien's page with erratum]
[implementation in GMP]

The binary algorithm is a variant of the Euclidean algorithm that
performs well in practice. We present a quasilinear time
recursive algorithm that computes the greatest common divisor of
two integers by simulating a slightly modified version of the
binary algorithm. The structure of the
recursive algorithm is very close to the one of the wellknown
KnuthSchönhage fast gcd algorithm, but the description and the
proof of correctness are significantly simpler in our case. This
leads to a simplification of the implementation and to better
running times.
Note (added 14 Oct 2009): since that work, Niels Möller has designed a
classical lefttoright/MSB fast gcd algorithm, cf his paper
On Schönhage's
algorithm and subquadratic integer gcd computation (Mathematics of
Computation, volume 77, number 261, pages 589607, 2008) and his
slides.
Note (added 1st April 2010): Robert Harley had published in his
ECDL code a similar
extended binary gcd algorithm (however only in the quadratic case).
Note (added 26 February 2019): as reported by BoYin Yang and Dan Bernstein,
the sequence G_{n} = 0,1,1,5,9,29,65,181,441,1165,..., which is the
worstcase for Theorem 2, is not necessarily the worstcase in practice
(as claimed in Section 6.2). See their
paper.
Algorithms for finding almost irreducible and almost primitive trinomials,
with Richard Brent, April 2003, Proceedings of a Conference in Honour of Professor H. C. Williams, Banff, Canada (May 2003), The Fields Institute, Toronto.

Consider polynomials over GF(2).
We describe efficient
algorithms for finding trinomials with
large irreducible (and possibly primitive) factors, and give examples
of trinomials
having a primitive factor of degree r for all Mersenne exponents
r = + 3 mod 8 in the range 5 < r < 2976221,
although there is no irreducible trinomial of degree r.
We also give
trinomials with a primitive factor of degree r = 2^{k}
for 3 <= k <= 12. These trinomials enable efficient
representations of the finite field GF(2^{r}).
We show how trinomials with large primitive factors can
be used efficiently in applications where primitive trinomials
would normally be used.
Note added April 22, 2009: this paper is mentioned in Divisibility of
Trinomials by Irreducible Polynomials over F_{2}, by Ryul Kim and
Wolfram Koepf, International Journal of Algebra, Vol. 3, 2009, no. 4, 189197
(arxiv).
Accurate Summation: Towards a Simpler and Formal Proof,
with Laurent Fousse, March 2003, in Proc. of
RNC'5, pages 97108.

This paper provides a simpler proof of the ``accurate summation'' algorithm
proposed by Demmel and Hida in
DeHi02.
It also gives improved bounds in some cases, and examples showing that those
new bounds are optimal.
This simpler proof will be used to obtain a computergenerated proof
of DemmelHida's algorithm, using a proof assistant like HOL, PVS or Coq.
10^{2098959} [in french], décembre 2002,
paru dans la
Gazette du Cines,
numéro 14, janvier 2003.

Cet article décrit les premiers résultats de la recherche
(avec Richard Brent et Samuli Larvala) de trinômes primitifs de
degré 6972593 sur GF(2), et indique quelques conséquences
amusantes de la loi de Moore.
A Fast Algorithm for Testing Reducibility of Trinomials mod 2
and Some New Primitive Trinomials of Degree 3021377,
with Richard Brent and Samuli Larvala, Mathematics of Computation,
volume 72, number 243, pages 14431452, 2003.
A preliminary version appeared as
Report
PRG TR1300, Oxford
University Computing Laboratory, December 2000.

The standard algorithm for testing irreducibility of a trinomial of prime degree r over GF(2) requires 2r + O(1) bits of memory and of order r^{2}
bitoperations. We describe an algorithm which requires only 3r/2 + O(1) bits of memory and less bitoperations than the standard algorithm.
Using the algorithm, we have found several new irreducible trinomials of high degree.
If r is a Mersenne exponent (i.e. 2^{r}  1 is a Mersenne prime), then an irreducible trinomial of degree r is necessarily primitive and can be used
to give a pseudorandom number generator with period at least 2^{r}  1. We give examples of primitive trinomials for r = 756839, 859433, and
3021377. The results for r = 859433 extend and correct some computations of Kumada et al [Mathematics of Computation 69 (2000),
811814]. The two results for r = 3021377 are primitive trinomials of the highest known degree.
2002:
Ten Consecutive Primes In Arithmetic Progression, with Harvey Dubner, Tony Forbes, Nik Lygeros, Michel Mizony and Harry Nelson, Mathematics of Computation, volume 71, number 239, pages 13231328, 2002.

In 1967 the first set of 6 consecutive primes in arithmetic progression was found. In 1995 the first set of 7 consecutive primes in arithmetic progression was found. Between November, 1997 and March, 1998, we succeeded in finding sets of 8, 9 and 10 consecutive primes in arithmetic progression. This was made possible because of the increase in computer capability and availability, and the ability to obtain computational help via the Internet. Although it is conjectured that there exist arbitrarily long sequences of consecutive primes in arithmetic progression, it is very likely that 10 primes will remain the record for a long time.
Worst Cases and Lattice Reduction, with Damien Stehlé and Vincent Lefèvre,
INRIA Research Report RR4586, October 2002.
Appeared in the proceedings of the 16th IEEE Symposium on Computer Arithmetic
(Arith'16), IEEE Computer Society, pages 142147, 2003.

We propose a new algorithm to find worst cases for correct rounding
of an analytic function.
We first reduce this problem to the real small value problem
 i.e. for polynomials with real coefficients.
Then we show that this second problem can be solved efficiently,
by extending Coppersmith's work
on the integer small value problem
 for polynomials with integer coefficients 
using lattice reduction
[Coppersmith96a,Coppersmith96b,Coppersmith01].
For floatingpoint numbers with a mantissa less than $N$,
and a polynomial approximation of degree $d$,
our algorithm finds all worst cases at distance $< N^{\frac{d^2}{2d+1}}$
from a machine number in time $O(N^{\frac{d+1}{2d+1}+\varepsilon})$.
For $d=2$, this improves on the $O(N^{2/3+\varepsilon})$ complexity
from Lef\`evre's algorithm [Lefevre00,LeMu01] to $O(N^{3/5+\varepsilon})$.
We exhibit some new worst cases found using our algorithm,
for doubleextended and quadruple precision.
For larger $d$, our algorithm can be used to check that there exist no
worst cases at distance $< N^{k}$ in time $O(N^{\frac{1}{2}+O(\frac{1}{k})})$.
A Proof of GMP Square Root, with Yves Bertot and Nicolas Magaud,
Journal of Automated Reasoning, volume 29, 2002,
pages 225252, Special Issue on Automating and Mechanising Mathematics: In
honour of N.G. de Bruijn.
A preliminary version appeared as
INRIA Research Report 4475.

We present a formal proof (at the implementation level) of an efficient
algorithm proposed by Paul Zimmermann
[Zimmermann00]
to compute square roots
of arbitrary large integers.
This program, which is part of the GNU Multiple Precision
Arithmetic Library (GMP) is completely proven within the
Coq system. Proofs are developed using the Correctness tool
to deal with imperative features of the program.
The formalization is rather large (more than 13000 lines) and requires
some advanced techniques for proof management and reuse.
Note: Vincent Lefèvre found a potential problem in the GMP implementation,
which is fixed by the following patch. This does not
contradicts our proof: the problem is due to the different C data types
(signed or not, different width), whereas our proof assumed a unique type.
Aliquot Sequence 3630 Ends After Reaching 100 Digits, with M. Benito,
W. Creyaufmüller and J. L. Varona, Experimental Mathematics, volume 11,
number 2, pages 201206.

In this paper we present a new computational record: the aliquot sequence
starting at 3630 converges to 1 after reaching a hundred decimal digits.
Also, we show the current status of all the aliquot sequences starting with a
number smaller than 10,000; we have reached at leat 95 digits for all of them.
In particular, we have reached at least 112 digits for the socalled
"Lehmer five sequences," and 101 digits for the "Godwin twelve sequences."
Finally, we give a summary showing the number of aliquot sequences of unknown
end starting with a number less than or equal 10^{6}.
Note added 29 July 2008: in this paper, we say (page 203, middle of right
column) "It is curious to note that the driver 2^{9} * 3 * 11 * 31
has appeared in no place in any of the sequences given in Table 1";
Clifford Stern notes that this driver appears at index 215 of sequence 165744,
which gives 2^{9} * 3 * 7 * 11 * 31^{2} * 37 * 10594304241173.
2001:
De l'algorithmique à l'arithmétique via le calcul formel, Habilitation à diriger des recherches,
novembre 2001. (Transparents de la soutenance.)

This document presents my research contributions from 1988 to 2001,
performed first at INRIA Rocquencourt within the Algo project (1988 to 1992),
then at INRIA Lorraine and LORIA within the projects Euréca (19931997),
PolKA (19982000), and Spaces (2001).
Three main periods can be roughly
distinguished: from 1988 to 1992 where my research
focused on analysis of algorithms and random generation,
from 1993 to 1997 where I worked on computer algebra and related algorithms,
finally from 1998 to 2001 where I was interested in arbitrary precision
floatingpoint arithmetic with welldefined semantics.
Arithmétique en précision
arbitraire, rapport de recherche INRIA 4272,
septembre 2001, paru dans la revue
"Réseaux et Systèmes Répartis, Calculateurs parallèles",
volume 13, numéro 45, pages 357386, 2001 [in french].

This paper surveys the available algorithms for integer or
floatingpoint arbitrary precision calculations. After a brief
discussion about possible memory representations, known algorithms
for multiplication, division, square root, greatest common
divisor, input and output, are presented, together with their
complexity and usage. For each operation, we present the naïve
algorithm, the asymptotically optimal one, and also intermediate
"divide and conquer" algorithms, which often are very useful. For
floatingpoints computations, some generalpurpose methods are
presented for algebraic, elementary, hypergeometric and special
functions.
Tuning and Generalizing Van Hoeij's Algorithm,
with Karim Belabas and Guillaume Hanrot, INRIA Research report 4124,
February 2001.

Recently, van Hoeij's published a new algorithm for factoring
polynomials over the rational integers. This algorithms rests
on the same principle as BerlekampZassenhaus, but uses
lattice basis reduction to improve drastically on the
recombination phase. The efficiency of the LLL algorithm is very
dependent on fine tuning; in this paper, we present such tuning to
achieve better performance. Simultaneously, we describe a
generalization of van Hoeij's algorithm to factor polynomials over
number fields.
Efficient isolation of a polynomial real roots,
with Fabrice Rouillier, INRIA Research report 4113, February 2001.
Appeared in Journal of Computational and Applied Mathematics, volume 162,
number 1, pages 3350, 2004.

This paper gives new results for the isolation of real roots of a univariate
polynomial using Descartes' rule of signs, following work of Vincent,
Uspensky, Collins and Akritas, Johnson, Krandick.
The first contribution is a generic algorithm which enables one to describe
all the existing strategies in a unified framework.
Using that framework, a new algorithm is presented, which is optimal
in terms of memory usage, while doing no more computations than other
algorithms based on Descartes' rule of signs.
We show that these critical optimizations have important consequences
by proposing a full efficient solution for isolating the real roots
of zerodimensional polynomial systems.
Density results on floatingpoint invertible numbers, with Guillaume Hanrot, Joël Rivat and Gérald Tenenbaum,
Theoretical Computer Science, volume 291, number 2, 2003, pages 135141.
(The slides of a related talk I gave in January 2002 at the workshop
"Number Theory and Applications" in Luminy are
here.)

Let F_k denote the kbit mantissa
floatingpoint (FP) numbers.
We prove a conjecture of J.M. Muller according to which
the proportion of numbers in F_k with no FPreciprocal (for rounding to the
nearest element) approaches
1/23/2 log(4/3) i.e. about 0.06847689 as
k goes to infinity.
We investigate a similar question for the inverse square root.
2000:
Factorization of a 512bit RSA
Modulus,
with Stefania Cavallar, Bruce Dodson, Arjen K. Lenstra, Walter Lioen,
Peter L. Montgomery, Brian Murphy, Herman te
Riele, Karen Aardal, Jeff Gilchrist,
Gérard Guillerm, Paul Leyland, Joël
Marchand, François Morain, Alec Muffett,
Chris Putnam, Craig Putnam, Proceedings of Eurocrypt'2000,
LNCS 1807, pages 118, 2000.

On August 22, 1999, we completed the factorization of the 512bit 155digit number RSA155 with the help of the Number Field Sieve factoring method (NFS). This is a new record for factoring general numbers. Moreover, 512bit RSA keys are frequently used for the protection of electronic commerce  at least outside the USA  so this factorization represents a breakthrough in research on RSAbased systems. The previous record, factoring the 140digit number RSA140, was established on February 2, 1999, also with the help of NFS, by a subset of the team which factored RSA155. The amount of computing time spent on RSA155 was about 8400 MIPS years, roughly four times that needed for RSA140; this is about half of what could be expected from a straightforward extrapolation of the computing time spent on factoring RSA140 and about a quarter of what would be expected from a straightforward extrapolation from the computing time spent on RSA130. The speedup is due to a new polynomial selection method for NFS of Murphy and Montgomery which was applied for the first time to RSA140 and now, with improvements, to RSA155.
A proof of GMP fast division and square root implementations,
September 2000.  This short note gives a detailed correctness proof of
fast (i.e. subquadratic) versions of the
GNU MP
mpn_bz_divrem_n and mpn_sqrtrem functions, together with
complete GMP code.
The mpn_bz_divrem_n function divides (with remainder)
a number of 2n limbs by a divisor of n limbs in 2K(n), where K(n) is the
time spent in a (n times n) multiplication, using the
MoenckBorodinJebeleanBurnikelZiegler algorithm.
The mpn_sqrtrem computes the square root and the remainder
of a number of 2n limbs (square root and remainder have about n limbs each)
in time 3K(n)/2; it uses Karatsuba Square Root.
Speeding up the Division and Square Root of Power Series, with Guillaume Hanrot and Michel Quercia,
INRIA Research Report 3973, July 2000.

We present new algorithms for the inverse, quotient, or
square root of power series.
The key trick is a new algorithm  RecursiveMiddleProduct or RMP
 computing the n middle coefficients of a
(2n * n) product in essentially the same number of operations  K(n)
 than a full (n * n) product with Karatsuba's method.
This improves previous work of Mulders, Karp and Markstein,
Burnikel and Ziegler.
These results apply both to series, polynomials, and multiple precision
floatingpoint numbers.
A Maple implementation is available here,
together with slides of a talk given at ENS Paris
in January 2004.
Factorization in Z[x]: the searching phase, with John Abbott and Victor Shoup, April 2000,
Proceedings of ISSAC'2000
[HAL entry].

In this paper we describe ideas used to accelerate the Searching Phase
of the BerlekampZassenhaus algorithm, the algorithm most widely used
for computing factorizations in Z[x]. Our ideas do not alter the
theoretical worstcase complexity, but they do have a significant effect in
practice: especially in those cases where the cost of the Searching
Phase completely dominates the rest of the algorithm. A complete
implementation of the ideas in this paper is publicly available in the
library NTL. We give timings of this implementation on
some difficult factorization problems.
1999:
Karatsuba Square
Root, INRIA Research Report 3905, November 1999.

We exhibit an algorithm to compute the squareroot with remainder of a nword
number in (3/2) K(n) word operations, where K(n) is the number of words
operations to multiply two nword numbers using Karatsuba's algorithm. If the
remainder is not needed, the cost can be reduced to K(n) on average. This
algorithm can be used for floatingpoint or polynomial computations too;
although not optimal asymptotically, its simplicity gives a wide range of use,
from about 50 to 1,000,000 digits, as shown by computer experiments.
On Sums of Seven Cubes,
with Francois Bertault and Olivier
Ramaré, Mathematics of Computation, volume 68, number 227,
pages 13031310, 1999.

We show that every integer between 1290741 and 3.375 * 10^{12} is
a sum of 5 nonnegative cubes, from which we deduce that every integer which
is a cubic residue modulo 9 and an invertible cubic residue modulo 37 is a sum
of 7 nonnegative cubes.
Uniform Random Generation of
Decomposable Structures Using FloatingPoint Arithmetic with Alain
Denise, Theoretical Computer Science, volume 218, number 2, 219232, 1999.
A preliminary version appeared as
INRIA Research Report 3242, September 1997.
 The recursive method formalized by Nijenhuis and Wilf [NiWi78] and
systematized by Flajolet, Van Cutsem and Zimmermann [FlZiVa94], is extended
here to floatingpoint arithmetic. The resulting ADZ method enables one to
generate decomposable data structures  both labelled or unlabelled 
uniformly at random, in expected O(n^{1+ε}) time and space, after a
preprocessing phase of O(n^{2+ε}) time, which reduces to
O(n^{1+ε}) for contextfree grammars.
1998:
Estimations asymptotiques
du nombre de chemins NordEst de
pente fixée et de largeur bornée, avec Isabelle Dutour et
Laurent Habsieger, INRIA Research Report RR3585, décembre 1998
[in french].
 We study here a quantity related to the number of walks with North and East steps staying under the line of slope d starting from the origin. We give an asymptotic analysis of this quantity with respect to both the width n and the slope d, answering to a question asked by Bernard Mourrain.
1997:
Calcul formel : ce qu'il y a dans la boîte,
journées XUPS, octobre 1997.
Cinq algorithmes de calcul
symbolique, INRIA Technical Report RT0206,
notes de cours d'un module de spécialisation du DEA d'informatique
de l'Université Henri Poincaré Nancy 1, 1997 [in french].

These are lecture notes of a course entitled "Some computer algebra
algorithms" given by the author at the University of Nancy 1 in 1997. Five
fundamental algorithms used by computer algebra systems are briefly described:
Gosper's algorithm for computing indefinite sums, Zeilberger's algorithm for
definite sums, Berlekamp's algorithm for factoring polynomials over finite
fields, Zassenhaus' algorithm for factoring polynomials with integer
coefficients, and Lenstra's integer factorization algorithm using elliptic
curves. All these algorithms were implemented  or improved  by the
author in the computer algebra system MuPAD.
Progress Report on
Parallelism in MuPAD, with
Christian Heckler and Torsten Metzner,
Inria Research Report 3154, April 1997.

MuPAD is a general purpose computer algebra system with two programming concepts for parallel processing: microparallelism for sharedmemory machines and macroparallelism for distributed architectures. This article describes language instructions for both concepts, the current state of implementation, together with some examples.
1996:
Polynomial Factorization Challenges,
with L. Bernardin and M. Monagan, poster presented at
the International Symposium on Symbolic and Algebraic Computation
(ISSAC), July 1996, 4 pages.

Joachim von zur Gathen has proposed a challenge for factoring
univariate polynomials over finite fields to evaluate the practicability
of current factorization algorithms ("A Factorization Challenge",
SIGSAM Bulletin 26(2):2224, 1992).
More recently, Victor Shoup has proposed an alternate family of polynomials
with a similar goal in mind.
Our effort is to take these challenges on using the general purpose computer
algebra systems Maple and MuPAD.
The result of our work are the factorizations of the von zur Gathen polynomials
f_{n} and of the Shoup polynomials F_{n} for n from 1 to 500.
We also present
the factorization of the degree 1000 von zur Gathen polynomial
f_{1000}.
1994:
GFUN: a Maple package for the manipulation of generating and holonomic
functions in one variable, with Bruno Salvy, ACM Transactions on
Mathematical Software, volume 20, number 2, 1994.
A preliminary version appeared as
INRIA Technical Report 143, October 1992.

We describe the GFUN package which contains functions for manipulating sequences, linear recurrences or differential equations and generating functions of various types. This document is intended both as an elementary introduction to the subject and as a reference manual for the package.
Random walks, heat equation and distributed algorithms, with Guy
Louchard, René Schott and Michael Tolley, Journal of Computational and
Applied Mathematics, volume 53, pages 243274, 1994.

New results are obtained concerning the analysis of the storage allocation
algorithm which permits one to maintain two stacks inside a shared (continuous)
memory area of fixed size m and of the banker's algorithm (a deadlock
avoidance policy).
The formulation of these problems is in terms of random walks inside polygonal
domains in a twodimensional lattice space with several reflecting barriers
and one absorbing barrier.
For the twostacks problem, the return time to the origin, the time to
absorption, the last leaving time from the origin and the number of returns
to the origin before absorption are investigated. For the banker's algorithm,
the trendfree absorbed random walk is analyses with numerical methods.
We finally analyse the average excursion along one axis for the classical
random walk: an analytic method enables us to deduce asymptotic results for
this average excursion.
The nQueens Problem, with Igor Rivin and Ilan Vardi, American
Mathematical Monthly, volume 101, number 7, pages 629639.

We give several lower and upper bounds for the number Q(n) of ways to put n
queens on an nxn chessboard, and the number T(n) to put n queens on a toroidal
chessboard (i.e., with n upper diagonals instead of 2n1).
We also conjecture that (log T(n))/(n log n) and (log Q(n))/(n log n) tend to
two positive constants.
Note added 11 September 2012:
in Remark 1 page 636, the primes p = 2q+1 are called
safe primes (and the primes q are called SophieGermain primes).
Moreover Warren Smith pointed the paper "The nqueens problem" by Bruen and
Dixon (Discrete Mathematics, 1975) which gives a construction yielding
(if I am correct) at least 8 * binomial(floor((p3)/8),2) * p different
toroidal solutions for any prime p >= 13.
Also on page 635 we say "Remark. Corollary 1 gives the first example of a set
of n's for which Q(n) grows faster than a polynomial in n". This is wrong,
since T. Kløve gives in "The modular nqueens problem" (Discrete Mathematics,
1977) a construction yielding for n=p*p at least n*(p3)*p^{p1}
(thanks again to Warren Smith for pointing us this).
1993:
A Calculus of Random
Generation, with Philippe Flajolet and Bernard Van Cutsem, Proceedings
of European Symposium on Algorithms (ESA'93), LNCS 726, pages 169180, 1993.

A systematic approach to the random generation of labelled combinatorial objects is presented. It applies to structures that are decomposable, i.e., formally specifiable by grammars involving union, product, set, sequence, and cycle constructions. A general strategy is developed for solving the random generation problem with two closely related types of methods: for structures of size n, the boustrophedonic algorithms exhibit a worstcase behaviour of the form
O(n log n); the sequential algorithms have worst case
O(n^{2}), while offering good potential for optimizations in the
average case. (Both methods appeal to precomputed numerical tables of linear size).
A companion calculus permits to systematically compute the average case cost of the sequential generation algorithm associated to a given specification. Using optimizations dictated by the cost calculus, several random generation algorithms are developed, based on the sequential principle; most of them have expected complexity 1/2 n log n, thus being only slightly superlinear. The approach is exemplified by the random generation of a number of classical combinatorial structures including Cayley trees, hierarchies, the cycle decomposition of permutations, binary trees, functional graphs, surjections, and set partitions.
Epelle : un logiciel de
détection de fautes
d'orthographe, INRIA Research Report 2030, September 1993.

This report describes the algorithm used by the epelle program, together with its implementation in the C language. This program is able to check about 30.000 words every second on modern computers, without any error, contrary to the Unix spell program which makes use of hashing methods and could thus accept wrong words. The main principle of epelle is to use digital trees (also called dictionary trees), which in addition reduces the space needed to store the list of words (by a factor of about 5 for the french dictionary). Creating a new digital tree for the franch language (about 240.000 words) takes only a dozen of seconds. The same program is directly usable for other languages and more generally for any list of alphanumeric keys.
1991:
Automatic Averagecase Analysis of
Algorithms, with Ph. Flajolet and B. Salvy, Theoretical Computer Science,
volume 79, number 1, pages 37109, 1991.

Many probabilistic properties of elementary discrete combinatorial structures of interest for the averagecase analysis of algorithms prove to be decidable. This paper presents a general framework in which such decision procedures can be developed. It is based on a combination of generating function techniques for counting, and complex analysis techniques for asymptotic estimations.
We expose here the theory of exact analysis in terms of generating functions for four different domains: the iterative/recursive and unlabelled/labelled data type domains. We then present some major components of the associated asymptotic theory and exhibit a class of naturally arising functions that can be automatically analyzed.
A fair fragment of this theory is also incorporated into a system called LambdaUpsilonOmega. In this way, using computer algebra, one can produce automatically nontrivial averagecase analyses of algorithms operating over a variety of decomposable combinatorial structures.
At a fundamental level, this paper is part of a global attempt at understanding why so many elementary combinatorial problems tend to have elementary asymptotic solutions. In several cases, it proves possible to relate entire classes of elementary combinatorial problems whose structure is well defined with classes of elementary special functions and classes of asymptotic forms relative to counting, probabilities, or averagecase complexity.
Séries
génératrices et analyse automatique d'algorithmes,
PhD thesis (in french), École Polytechnique, Palaiseau, 1991
[in french].
Also available in
postscript.

This thesis studies systematic methods to determine automatically the
averagecase cost of an algorithm. Those methods apply generally to descent
schemes in decomposable data structures, which enables one to model a large
class of problems.
More precisely, this thesis focuses on the first stage of the analysis of
an algorithm, namely the algebraic analysis, which translates the program
into mathematical objects, whereas the second stage extracts from those
mathematical objects the desired information about the averagecase cost.
We define a language to describe decomposable datastructures and descent
schemes on them. When one uses generating functions as mathematical objects
(counting generating functions for datastructures, and cost generating
functions for programs), we show that the algorithms described in this language
directly translate into systems of equations for the corresponding generating
functions, moreover using simple rules. From those equations, we can then
determine in polynomial time the exact averagecase cost for a given size of
the input datastructures. We can also use those equations to compute using
asymptotic analysis the averagecase cost when the input data size tends to
infinity, since we know that the asymptotic average cost is directly related
to the behaviour of those generating functions around their singularities.
Therefore, we show that to a given class of algorithms corresponds a
welldefined class of generating functions, and in turn a given class of
formulae for the asymptotic average cost.
Those algebraic analysis rules were included in a software tool for the
averagecase analysis of algorithms, called LambdaUpsilonOmega, which proved
useful for experiments and research.