Emmanuel Thomé — 2022 Cse 291 14

CSE 291 (14), Winter 2022: The Number Field Sieve #

3d plot of log(F(x,y))

Instructor Emmanuel Thomé.

Lectures Tuesday/Thursday, 11:00am-12:20pm, CSE building (EBU3B) 4258, and on Zoom during remote instruction period: https://ucsd.zoom.us/j/95170048002 (UCSD login required). Instruction will resume in person on Tuesday, February 1st. I'll do my best to live stream on zoom as well, but audio will most likely be absolutely terrible.

Office hours Tuesday 2pm-3pm, CSE building, EBU3B 3130 (or outside if warm enough). Or by appointment. (Note: only by appointment during remote learning, as I'm not on campus on a regular basis.)

Course web page https://cseweb.ucsd.edu/classes/wi22/cse291-14/ (this page)

Links

  • canvas ; I'll put at least the course slides on canvas, and I might use it more. But the present page will most probably have more information.
  • piazza ; Feel free to ask questions.

Overview #

The Number Field Sieve is the algorithm of choice for the important challenges of factoring integers and computing discrete logarithms. The hardness of these tasks is a fundamental assumption in public-key cryptography. The Number Field Sieve is a highly complicated algorithm. This course will try to cover the different steps of the algorithm in some detail, with a view towards their practical implementation.

The course will be more of algorithmic / implementation nature than on cryptography per se.

Background #

An "ideal" background would probably be pretty broad. The important things so as to not get lost would be undergrad mathematical stuff on rings, finite fields, linear algebra, and vector spaces. Algorithm-wise, the arithmetic of finite fields is important (representation of elements, how operations are performed, and a perception of their relative cost). Complexity notions are important as well. General algorithmic knowledge is desirable. There's a good deal of intersection between what I will assume is known and the first chapters of von zur Gathen and Gehrard's Modern Computer Algebra book (chapters 2 to 4 in particular), or with select chapters of V. Shoup's A Computational Introduction to Number Theory and Algebra (link).

The 1993 book by A.K. Lenstra and H.W. Lenstra, Jr, The Development of the Number Field Sieve (link) is the authoritative reference for the high-level view of NFS, and for its early history as an algorithm.

Familiarity with mathematical software such as SageMath is welcome. I will also use cado-nfs as an illustration platform for several things in class. Both are primarily Unix things, so if you want to follow anything software-related in this course, you're encouraged to either use a linux (or other unix) machine, or prepare a virtual machine for this purpose on your favorite computer.

Grading #

  • 80% of the grade is based on participation and attendance in class.
  • 20% is based on a small project, to be done individually. Your goal here is to pick a point that I mention in class, or that appears in the related literature that I list (e.g. "X is better than Y", or "X sucks because of Z"), and to illustrate so as to make a point based on detailed data (a priori experimental, but not necessarily). Have in mind that you're preparing one slide of a conference talk (or maybe two slides) and that you want to insist on that particular point. You may use any tool that you see fit, including actual implementations of NFS, and present the result in any way you like. Just don't recycle old stuff that doesn't come from you. I will appreciate these small projects based on how relevant they are to the number field sieve, and how much work you put into it. Small projects will be allocated a short time slot in class (5 minutes max) during which you'll have the opportunity to explain what you want to illustrate. Schedule-wise, please coordinate with me by e-mail regarding what you intend to prepare and present.

Literature #

There is no textbook for this class, but a very good read that covers both some of the required background, and also the general context of what this course addresses, is V. Shoup's book A Computational Introduction to Number Theory and Algebra, which happens to be accessible online (link).

Schedule #

What follows is a highly tentative (and possibly overly ambitious) schedule. This is intended to give a general idea of the topics I will probably cover, and it's entirely possible that I drop some items in this list, that I rearrange the schedule, and so on.

Direct access:

Tuesday 01/04: lecture 1 #

lecture slides

  • General introduction, course facts.
  • The cado-nfs software.
  • Warm up: prime numbers, compositeness testing.

Related literature:

  • V. Shoup, A Computational Introduction to Number Theory and Algebra, 2008. link
  • S. Galbraith, Mathematics of Public Key Cryptography, chapter 12, 2012. link.
  • J. von zur Gathen, J. Gehrard, Modern Computer Algebra, chapter 19, 1999.
  • D.E. Knuth, The Art of Computer Programming, volume 2 §4.5.4 Factoring into Primes, 1998 edition.

Thursday 01/06: lecture 2 #

lecture slides

  • Sieving and batch trial division.
  • "old" factoring algorithms.
  • Implementation points for old factoring algorithms.
  • Combinations of congruences
  • Dixon random squares, cfrac, and the quadratic sieve algorithm.

Related literature:

  • D.J. Bernstein, How to find small factors of integers, 2000. link
  • S. Galbraith, Mathematics of Public Key Cryptography, chapter 15, 2012. link.
  • J. von zur Gathen, J. Gehrard, Modern Computer Algebra, chapter 19, 1999.

Tuesday 01/11: lecture 3 #

lecture slides

  • QS-era folklore improvements: large primes, special-q.
  • Factoring by electronic mail, and other QS feats.
  • Smoothness and complexity analysis of QS
  • Tools for the complexity analysis. Smooth integers.
  • The L notation
  • Examples of functions that are in L(blah)^{1+o(1)}
  • Arithmetic with L(blah)

Related literature:

  • A.K. Lenstra, M.S. Manasse, Factoring by Electronic Mail, Eurocrypt 1989 link
  • E. Bach, R. Peralta, Asymptotic Semismoothness Probabilities, Mathematics of Computation, Volume 65, Number 216 October 1996, Pages 1701–1715 link

Thursday 01/13: lecture 4 #

lecture slides

NFS (for factoring) in a nutshell.

  • A rosy example where everything is easy.
  • Number field basics. How to represent elements.
  • How much of this example was really exceptionally easy.

Related literature:

  • J.M. Pollard, Factoring with Cubic Integers, 1993. link.
  • A. Guillevic, RSA, integer factorization, record computations, 2021. link
  • C. Pomerance. A Tale of Two Sieves, Notices of the AMS, 1996. link

Tuesday 01/18: lecture 5 #

lecture slides

Algebraic number theory background.

  • Number fields, algebraic numbers.
  • Algebraic integers.
  • Ideals.
  • Factoring into prime ideals.
  • Units.
  • Brief mention of further topics that I don't cover for brevity.

Related literature:

  • P. Samuel, Algebraic Theory of Numbers, Hermann, 1970.
  • H. Cohen, A course in Computational Algebraic Number Theory, Springer GTM 138, chapters 4 and 6, 1993. link.

Thursday 01/20: lecture 6 #

lecture slides

NFS in the not-so-easy case.

  • Things that can go wrong will go wrong.
  • Down-to-earth representation of ideals.
  • How hard is it to work out the number-theoretic details? Things that are easy, things that are truly hard.

Related literature:

  • J.P. Buhler, H.W. Lenstra, Jr, C. Pomerance, Factoring integers with the number field sieve, 1993. link.

Tuesday 01/25: lecture 7 #

lecture slides

Note At this point I'm running about a lecture behind compared to my original schedule, in particular because I inserted NFS analysis at this point. The schedule of the forthcoming lectures has been rearranged, but this is likely to change again.

Things that we did cover on 01/25:

  • How to make sense of an NFS relation.
  • Statement of the NFS algorithm.
  • The NFS diagram.
  • Rundown of a real NFS computation
  • Analysis of NFS.

Thursday 01/27: lecture 8 #

lecture slides

Polynomial selection

  • What can and what cannot be achieved.
  • The favorable case of the special number field sieve.
  • Skewness.
  • Algorithms to find polynomials with small coefficients. Kleinjung's 2005 and 2008 algorithms.
  • Time-critical parts in polynomial selection algorithms.

Related literature:

  • T. Kleinjung, Polynomial Selection for the Number Field Sieve, in Topics in Computational Number Theory Inspired by Peter L. Montgomery, 2017 link.
  • T. Kleinjung, On Polynomial Selection for the General Number Field Sieve, Mathematics of Computation, 2008 link.
  • S. Bai, C. Bouvier, A. Kruppa, P. Zimmermann, Better polynomials for GNFS, Mathematics of Computation, 2015. link.

Tuesday 02/01: lecture 9 #

lecture slides

Polynomial selection (continued)

  • Different metrics to assess the yield of a polynomial pair.
  • The importance of real roots.
  • Area below a given norm. Bernstein's integral calculation.
  • Sieving for root properties, and Murphy's alpha value.
  • Implementation constraints for polynomial selection.
  • Modeling the expected result of root sieving. Order statistics.
  • Polynomial selection in cado-nfs, and in large computations in general.

Related literature:

  • M. Elkenbracht-Huizing, An Implementation of the Number Field Sieve, 1996. link, other link
  • T. Kleinjung, On Polynomial Selection for the General Number Field Sieve, Mathematics of Computation, 2008 link.
  • B.A. Murphy, Polynomial Selection for the Number Field Sieve Integer Factorisation Algorithm, PhD thesis, Canberra, 1999. link.

Thursday 02/03: lecture 10 #

lecture slides

Collecting relations in NFS.

  • multiple ways to collect relations.
    • ECM everyone
    • batch everyone
    • sieve
  • Special-q, lattice sieving.
  • The sieving region is isotropic because of special-q (done on Tuesday)
  • Projective primes and sieving with powers.
  • The basis of the q-lattice, and skewed Gauss reduction

Related literature:

  • D.J. Bernstein, A.K. Lenstra, A general number field sieve implementation, 1993. link.
  • M. Elkenbracht-Huizing, An Implementation of the Number Field Sieve, 1996. link, other link

Tuesday 02/08: lecture 11 #

lecture slides

Sieving techniques

  • line-sieving. Bottlenecks.
  • Franke-Kleinjung enumeration.
  • Bucket sieving and layered bucket sieving.
  • Resieving.
  • The lattice sieving program in cado-nfs.
  • Parallelization.
  • Trading sieving with batch smoothness detection.

Some extra topics that are in the slides and that are not part of the course:

  • norm initialization.
  • adjusting the sieve region.

Related literature:

  • J. Franke, T. Kleinjung, Continued Fractions and Lattice Sieving, SHARCS 2005. link.
  • F. Boudot, A. Guillevic, P. Gaudry, N. Heninger, E. Thomé, Comparing the Difficulty of Factorization and Discrete Logarithm: A 240-Digit Experiment, CRYPTO 2020. link.
  • T. Kleinjung et al, Computation of a 768-bit prime field discrete logarithm, EUROCRYPT 2017. link

Thursday 02/10: lecture 12 #

lecture slides

Filtering

  • Gaussian elimination and structured Gaussian elimination
  • Markowitz rule
  • Cavallar filtering
  • Rules of thumb in filtering.
  • Simulating filtering.
  • Target metrics for a "good" output matrix.
  • Parallelization of filtering.

Related literature:

  • C. Pomerance, J.W. Smith, Reduction of huge, sparse matrices over finite fields via created catastrophes, 1992. link
  • S. Cavallar, Strategies in Filtering in the Number Field Sieve, ANTS 2000 link
  • C. Bouillaguet, P. Zimmermann, Parallel Structured Gaussian Elimination for the Number Field Sieve, 2021. link.

Tuesday 02/15: lecture 13 #

lecture slides

Linear algebra

  • How sparse are NFS matrices, and what do they look like?
  • What is a black-box algorithm?
  • Algorithms that can be used over finite fields.
  • Radical differences with the PDE / numerical context.

How to multiply a very sparse matrix by a vector efficiently.

Related literature:

  • B. LaMacchia, A. Odlyzko, Solving large sparse linear systems over finite fields, CRYPTO 1990, link.

Thursday 02/17: lecture 14 #

lecture slides

Algorithms for sparse linear algebra.

  • The Lanczos algorithm.
  • The Wiedemann algorithm.
  • Asymptotically fast Berlekamp-Massey
  • The appeal of block algorithms.
  • The block Lanczos algorithm
    • Rapid overview.
    • Limitations.
  • The block Wiedemann algorithm (beginning)

Related literature:

  • D.H. Wiedemann, Solving sparse linear equations over finite fields, IEEE Trans. Inf. Theory IT-32, pp. 54-62, 1986.
  • W. Eberly, E. Kaltofen, On Randomized Lanczos Algorithm, ISSAC 1997 link
  • P. L. Montgomery, A Block Lanczos for finding dependencies over GF(2), Eurocrypt 1995. link
  • E. Thomé, _The Block Lanczos Algorithm, in Topics in Computational Number Theory Inspired by Peter L. Montgomery, 2017 link

Tuesday 02/22: lecture 15 #

lecture slides

The block Wiedemann algorithm

  • Different steps of the algorithm.
  • Checking for data consistency.
  • Anticipating computation time and required resources.
  • Implementation on a cluster. Choosing the checkpoint frequency wisely.

Related literature:

  • E. Kaltofen, Analysis of Coppersmith’s Block Wiedemann Algorithm for the Parallel Solution of Sparse Linear Systems, Mathematics of Computation, 1995. link
  • J.-G. Dumas, E. Kaltofen, E. Thomé, G. Villard, Linear Time Interactive Certificates for the Minimal Polynomial and the Determinant of a Sparse Matrix, ISSAC 2016. link
  • E. Kaltofen, A. Lobo, Distributed Matrix-Free Solution of Large Sparse Linear Systems over Finite Fields, Algorithmica, 1999 link
  • B. Beckermann, G. Labahn, A uniform approach for the fast computation of matrix-type Padé approximants, SIAM Journal on Matrix Analysis and Applications, 1994. link, other link
  • E. Thomé, Subquadratic computation of vector generating polynomials and improvement of the block Wiedemann algorithm, Journal of Symbolic Computation, 2002. link,

Thursday 02/24: lecture 16 #

lecture slides

Computation of square roots.

  • Problem statement.
  • The bovine approach.
  • Caveats with the Galois group.
  • A CRT-based approach.
  • Montgomery's approach. Complexity.
  • Psychological aspects and crazy optimizations of a computationally negligible step.

Related literature:

  • P.L. Montgomery, Square roots of products of algebraic numbers, hard-to-find draft, 1997 my copy online.
  • P. Nguyen, A Montgomery-Like Square Root for the Number Field Sieve, ANTS 1998. link
  • E. Thomé, Square Root Algorithms for the Number Field Sieve, WAIFI 2012, link.
  • R.P. Brent, P. Zimmermann, Modern Computer Arithmetic, Cambridge University Press, 2010. link.

Tuesday 03/01: lecture 17 #

lecture slides

Discrete logarithms in finite fields.

  • Old algorithms
  • NFS-DL: specifics; Joux-Lercier polynomial selection.
  • The answer to the FAQ is that we don't care what is the generator.
  • How to map an ideal to an element, in an idealized setting.

Related literature:

  • S. Galbraith, Mathematics of Public Key Cryptography, chapter 15, link.
  • V. Shoup, lower bounds for discrete logarithms and related problems, EUROCRYPT 1997. link

Thursday 03/03: lecture 18 #

lecture slides

Discrete logarithms in finite fields.

  • Virtual logarithms (my way)
  • The NFS-DL diagram, and how to make sense of a relation.
  • Schirokauer maps and sparse linear algebra
  • Individual logarithm computations

Related literature:

  • A. Commeine, I. Semaev, An Algorithm to Solve the Discrete Logarithm Problem with the Number Field Sieve, PKC 2006. link.
  • J. Fried, P. Gaudry, N. Heninger, E. Thomé, A kilobit hidden SNFS computation, EUROCRYPT 2017. link.
  • D. Coppersmith, Fast evaluation of logarithms in fields of characteristic two, IEEE Trans. Inf. Theory, 1984. link.
    • O. Schirokauer, Discrete Logarithms and Local Units, Phil. Trans. R. Soc. London A, 1993. link, other link.
  • O. Schirokauer, Virtual Logarithms, J. Algorithms, 2005. link.
    • O. Schirokauer, Using number fields to compute logarithms in finite fields, Mathematics of Computation, 2000. link.

Tuesday 03/08: lecture 19 #

lecture slides

Variants of NFS

  • SNFS(-DL) versus GNFS(-DL).
  • FFS.
  • Small characteristic, medium characteristic: what does that mean?
  • The quasi-polynomial algorithm for small characteristic.
  • MNFS.

Related literature:

  • R. Barbulescu, P. Gaudry, A. Guillevic, and F. Morain. Improving NFS for the discrete logarithm problem in non-prime finite fields, EUROCRYPT 2015. link.
  • A. Guillevic, F. Morain. Discrete Logarithms, in N. El Mrabet and M. Joye, Guide to pairing-based cryptography, CRC Press, 2016. link.
  • G. De Micheli, Discrete Logarithm Cryptanalyses: Number Field Sieve and Lattice Tools for Side-Channel Attacks, PhD thesis, Nancy, 2021. link.
  • A. Joux, R. Lercier, N. Smart, F. Vercauteren, The Number Field Sieve in the Medium Prime Case, CRYPTO 2006. link.
  • M. Delcourt et al, Using the Cloud to Determine Key Strengths -- Triennial Update, technical report, 2018. link.
  • D.J. Bernstein, Arbitrarily tight bounds on the distribution of smooth integers. Pages 49–66 in Number theory for the Millennium volume 1, 2000. link.
  • T. Kleinjung, B. Wesolowski, Discrete logarithms in quasi-polynomial time in finite fields of fixed characteristic, J. Amer. Math. Soc. 35 (2022), 581-624. link.

Thursday 03/10: lecture 20 #

lecture slides

  • A brief timeline
  • What the complexity does and does not say of the hardness of attacking larger key sizes.
  • Takeaways from recent computations.

Related literature:

  • A. Le Gluher, P.-J. Spaenlehauer, E. Thomé, Refined Analysis of the Asymptotic Complexity of the Number Field Sieve, link.
  • F. Boudot, A. Guillevic, P. Gaudry, N. Heninger, E. Thomé, Comparing the Difficulty of Factorization and Discrete Logarithm: A 240-Digit Experiment, CRYPTO 2020. link.
  • J. Fried, P. Gaudry, N. Heninger, E. Thomé, A kilobit hidden SNFS computation, EUROCRYPT 2017. link.
  • See database for Dlog records here.
  • Largest GNFS factorization to date: link ; largest SNFS factorization to date: link.