Brief Description of chkfft, by K1JT ------------------------------------ Discrete Fourier transforms (DFTs) are found at the root of most digital signal processing tasks. In WSJT and its sister programs the transforms are done using the FFTW library, and subroutine four2 provides a convenient interface to the library. Program chkfft is a command-line utility offering a convenient way to test FFT execution times under a variety of circumstances. To compile chkfft in Linux: $ gfortran -o chkfft chkfft.f90 four2a.f90 f77_wisdom.f90 gran.c -lfftw3f To compile chkfft in Windows (you may need to customize the hard-coded path shown here for libfftw3f-3.dll): > gfortran -o chkfft chkfft.f90 four2a.f90 f77_wisdom.f90 gran.c \ /JTSDK-QT/appsupport/runtime/libfftw3f-3.dll To see a brief usage message, type chkfft at the command prompt: $ chkfft Usage: chkfft nr nw nc np nfft: length of FFT nfft=0: do lengths 2^n, n=2^4 to 2^23 infile: name of file with nfft values, one per line nr: 0/1 to not read (or read) wisdom nw: 0/1 to not write (or write) wisdom nc: 0/1 for real or complex data np: 0-4 patience for finding best algorithm As an example, to measure the speed of a complex DFT of length 131072: ####################################################################### $ chkfft 131072 0 1 1 2 nfft: 131072 nr: 0 nw 1 nc: 1 np: 2 NFFT Time rms MHz MFlops iters tplan ------------------------------------------------------------- 131072 0.0021948 0.00000032 59.72 5076.1 231 2.9 ####################################################################### Program output shows that on the test machine the average time for one forward (or inverse) transform of length N=131072 is about 2.2 ms, corresponding to slightly over 5 GFlops computing speed. The planning time in FFTW was 2.9 s. Running the command again with parameter nr=1 will use the "wisdom" already accumulated for complex N=131072 FFTs. The execution speed will be essentially the same, but no planning time is required: ####################################################################### $ chkfft 131072 1 1 1 2 nfft: 131072 nr: 1 nw 1 nc: 1 np: 2 NFFT Time rms MHz MFlops iters tplan ------------------------------------------------------------- 131072 0.0021575 0.00000032 60.75 5164.0 235 0.0 ####################################################################### Optimized algorithms can compute DFTs much faster for lengths that are the product of small integers. Length N=131072 = 2^17 is a good example, and FFTs should be very efficient. For comparison, look at the speed for N=131071, a prime number. The average time is now about 7 times larger: ####################################################################### C:\JTSDK-QT\src\wsjtx\lib>chkfft 131071 1 1 1 2 nfft: 131071 nr: 1 nw 1 nc: 1 np: 2 NFFT Time rms MHz MFlops iters tplan ------------------------------------------------------------- 131071 0.0153637 0.00000065 8.53 725.2 33 5.6 ####################################################################### Here's an example that measures execution times for all integral power-of-2 lengths from 2^4 to 2^23: ####################################################################### $ chkfft 0 1 1 1 2 nfft: 0 nr: 1 nw 1 nc: 1 np: 2 n N=2^n Time rms MHz MFlops iters tplan --------------------------------------------------------------- 4 16 0.0000003 0.00000014 58.61 1172.2 1000000 0.0 5 32 0.0000004 0.00000016 89.19 2229.6 1000000 0.0 6 64 0.0000006 0.00000016 109.44 3283.2 866975 0.0 7 128 0.0000009 0.00000021 135.92 4757.1 538369 0.0 8 256 0.0000016 0.00000020 158.40 6335.8 313701 0.0 9 512 0.0000032 0.00000021 162.53 7313.8 160943 0.1 10 1024 0.0000067 0.00000023 152.53 7626.5 75521 0.1 11 2048 0.0000136 0.00000025 150.42 8273.3 37239 0.2 12 4096 0.0000316 0.00000027 129.75 7784.8 16060 0.3 13 8192 0.0000720 0.00000026 113.75 7393.8 7040 0.5 14 16384 0.0001620 0.00000028 101.11 7078.0 3129 0.9 15 32768 0.0003227 0.00000030 101.53 7615.1 1571 1.7 16 65536 0.0010020 0.00000030 65.41 5232.5 506 4.1 17 131072 0.0021575 0.00000032 60.75 5164.0 235 0.0 18 262144 0.0053937 0.00000032 48.60 4374.2 94 3.6 19 524288 0.0190668 0.00000034 27.50 2612.2 27 6.8 20 1048576 0.0468001 0.00000035 22.41 2240.5 11 2.4 21 2097152 0.0936012 0.00000036 22.41 2352.5 6 31.6 22 4194304 0.1949997 0.00000037 21.51 2366.0 3 9.8 23 8388608 0.4212036 0.00000038 19.92 2290.3 2 112.9 ####################################################################### Test data for all transforms is gaussian random noise of zero mean and standard deviation 1. Tabulated values of "rms" are the root-mean-square differences between the original data and the back-transfmred data. File nfft.dat contains all numbers between 2^3 and 2^23 with no factor greater than 7, followed by their factors. These numbers are good choices for FFT lengths. File all_fft.out gives the result on one machine of running the command $ chkfft nfft.dat 0 1 1 2 Take note: this task may take as much as 24 hours, or even more!