Fork me on GitHub

Introduction

The Numerics module includes support for:

  • 1D, 2D and 3D FFTs that leverage SIMD vector instructions.
  • Limited memory BFGS optimization
  • b-Splines
  • Erf and Erfc
  • Tensor recursions

L-BFGS Optimization

The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm is appropriate for large-scale multidimensional unconstrained optimization problems. It is derived from Robert Dodier's Java translation of orignal FORTRAN code by Jorge Nocedal.

Fast Fourier Transformation

The FFT classes implement:

  • 1D Real to Complex
  • 1D Complex to Complex
  • 3D Real to Complex
  • 3D Complex to Complex
  • 3D Real to Complex Convolutions
  • 3D Complex to Complex Convolutions

The 1D methods compute the FFT of real or complex, double precision data of arbitrary length n using a mixed radix method that has special methods to handle factors of [2, 3, 4, 5, 6, 7] and a general method for larger prime factors.

The 3D methods are serial or SMP parallel. The convolutions are designed for the reciprocal space portion of Particle Mesh Ewald summation.

Selected timing results comparing FFTW and FFX on MacOS with an Apple M2 Max CPU (FFTW 3.3.10) are shown below for 3D double precision C2C round-trip FFTs.

Size Library Threads Operation Time (sec)
64×64×64 FFTW 1 Forward + Reverse FFT 0.005549
FFTW 8 Forward + Reverse FFT 0.001469
FFX 8 Forward + Reverse FFT 0.001022
FFX 8 Convolution 0.000867
128×128×128 FFTW 1 Forward + Reverse FFT 0.056120
FFTW 8 Forward + Reverse FFT 0.010322
FFX 8 Forward + Reverse FFT 0.009983
FFX 8 Convolution 0.008273
256×256×256 FFTW 1 Forward + Reverse FFT 0.633118
FFTW 8 Forward + Reverse FFT 0.155233
FFX 8 Forward + Reverse FFT 0.106308
FFX 8 Convolution 0.083182

Note that for FFX, the convolution operation is faster than the sum of individual Forward + Reverse FFTs due to eliminating two 3D matrix transpose operations.