mirror of
				https://github.com/saitohirga/WSJT-X.git
				synced 2025-10-26 10:30:22 -04:00 
			
		
		
		
	
		
			
				
	
	
		
			251 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			251 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| [/
 | |
| Copyright 2015 Paul A. Bristow.
 | |
| Copyright 2015 John Maddock.
 | |
| Distributed under the Boost Software License, Version 1.0.
 | |
| (See accompanying file LICENSE_1_0.txt or copy at
 | |
| http://www.boost.org/LICENSE_1_0.txt).
 | |
| ]
 | |
| 
 | |
| [section:root_comparison  Comparison of Root Finding Algorithms]
 | |
| 
 | |
| [section:cbrt_comparison  Comparison of Cube Root Finding Algorithms]
 | |
| 
 | |
| In the table below, the cube root of 28 was computed for three __fundamental_types floating-point types,
 | |
| and one __multiprecision type __cpp_bin_float using 50 decimal digit precision, using four algorithms.
 | |
| 
 | |
| The 'exact' answer was computed using a 100 decimal digit type:
 | |
| 
 | |
|    cpp_bin_float_100 full_answer ("3.036588971875662519420809578505669635581453977248111123242141654169177268411884961770250390838097895");
 | |
| 
 | |
| Times were measured using  __boost_timer using `class cpu_timer`.
 | |
| 
 | |
| * ['Its] is the number of iterations taken to find the root.
 | |
| * ['Times] is the CPU time-taken in arbitrary units.
 | |
| * ['Norm] is a normalized time, in comparison to the quickest algorithm (with value 1.00).
 | |
| * ['Dis] is the distance from the nearest representation of the 'exact' root in bits.
 | |
| Distance from the 'exact' answer is measured by using function __float_distance.
 | |
| One or two bits distance means that all results are effectively 'correct'.
 | |
| Zero means 'exact' - the nearest __representable value for the floating-point type.
 | |
| 
 | |
| The cube-root function is a simple function, and is a contrived example for root-finding.
 | |
| It does allow us to investigate some of the factors controlling efficiency that may be extrapolated to
 | |
| more complex functions.
 | |
| 
 | |
| The program used was [@ ../../example/root_finding_algorithms.cpp root_finding_algorithms.cpp].
 | |
| 100000 evaluations of each floating-point type and algorithm were used and the CPU times were
 | |
| judged from repeat runs to have an uncertainty of 10 %.  Comparing MSVC for `double` and `long double`
 | |
| (which are identical on this patform) may give a guide to uncertainty of timing.
 | |
| 
 | |
| The requested precision was set as follows:
 | |
| 
 | |
| [table
 | |
| [[Function][Precision Requested]]
 | |
| [[TOMS748][numeric_limits<T>::digits - 2]]
 | |
| [[Newton][floor(numeric_limits<T>::digits * 0.6)]]
 | |
| [[Halley][floor(numeric_limits<T>::digits * 0.4)]]
 | |
| [[Schr'''ö'''der][floor(numeric_limits<T>::digits * 0.4)]]
 | |
| ]
 | |
| 
 | |
| * The C++ Standard cube root function [@http://en.cppreference.com/w/cpp/numeric/math/cbrt std::cbrt]
 | |
| is only defined for built-in or fundamental types,
 | |
| so cannot be used with any User-Defined floating-point types like __multiprecision.
 | |
| This, and that the cube function is so impeccably-behaved,
 | |
| allows the implementer to use many tricks to achieve a fast computation.
 | |
| On some platforms,`std::cbrt` appeared several times as quick as the more general `boost::math::cbrt`,
 | |
| on other platforms / compiler options `boost::math::cbrt` is noticeably faster.  In general, the results are highly
 | |
| dependent on the code-generation / processor architecture selection compiler options used.  One can
 | |
| assume that the standard library will have been compiled with options ['nearly] optimal for the platform
 | |
| it was installed on, where as the user has more choice over the options used for Boost.Math.  Pick something
 | |
| too general/conservative and performance suffers, while selecting options that make use of the latest
 | |
| instruction set opcodes speed's things up noticeably.
 | |
| 
 | |
| * Two compilers in optimise mode were compared: GCC 4.9.1 using Netbeans IDS
 | |
| and Microsoft Visual Studio 2013 (Update 1) on the same hardware.
 | |
| The number of iterations seemed consistent, but the relative run-times surprisingly different.
 | |
| 
 | |
| * `boost::math::cbrt` allows use with ['any user-defined floating-point type], conveniently
 | |
| __multiprecision.  It too can take some advantage of the good-behaviour of the cube function,
 | |
| compared to the more general implementation in the nth root-finding examples.  For example,
 | |
| it uses a polynomial approximation to generate a better guess than dividing the exponent by three,
 | |
| and can avoid the complex checks in __newton required to prevent the
 | |
| search going wildly off-track.  For a known precision, it may also be possible to
 | |
| fix the number of iterations, allowing inlining and loop unrolling.  It also
 | |
| algebraically simplifies the Halley steps leading to a big reduction in the
 | |
| number of floating point operations required compared to a "black box" implementation
 | |
| that calculates the derivatives seperately and then combines them in the Halley code.
 | |
| Typically, it was found that computation using type `double`
 | |
| took a few times longer when using the various root-finding algorithms directly rather
 | |
| than the hand coded/optimized `cbrt` routine.
 | |
| 
 | |
| * The importance of getting a good guess can be seen by the iteration count for the multiprecision case:
 | |
| here we "cheat" a little and use the cube-root calculated to double precision as the initial guess.
 | |
| The limitation of this tactic is that the range of possible (exponent) values may be less than the multiprecision type.
 | |
| 
 | |
| * For __fundamental_types, there was little to choose between the three derivative methods,
 | |
| but for __cpp_bin_float, __newton was twice as fast.  Note that the cube-root is an extreme
 | |
| test case as the cost of calling the functor is so cheap that the runtimes are largely
 | |
| dominated by the complexity of the iteration code.
 | |
| 
 | |
| * Compiling with optimisation halved computation times, and any differences between algorithms
 | |
| became nearly negligible.  The optimisation speed-up of the __TOMS748 was especially noticable.
 | |
| 
 | |
| * Using a multiprecision type like `cpp_bin_float_50` for a precision of 50 decimal digits
 | |
| took a lot longer, as expected because most computation
 | |
| uses software rather than 64-bit floating-point hardware.
 | |
| Speeds are often more than 50 times slower.
 | |
| 
 | |
| * Using `cpp_bin_float_50`,  __TOMS748 was much slower showing the benefit of using derivatives.
 | |
| __newton was found to be twice as quick as either of the second-derivative methods:
 | |
| this is an extreme case though, the function and its derivatives are so cheap to compute that we're
 | |
| really measuring the complexity of the boilerplate root-finding code.
 | |
| 
 | |
| * For multiprecision types only one or two extra ['iterations] are needed to get the remaining 35 digits, whatever the algorithm used.
 | |
| (The time taken was of course much greater for these types).
 | |
| 
 | |
| * Using a 100 decimal-digit type only doubled the time and required only a very few more iterations,
 | |
| so the cost of extra precision is mainly the underlying cost of computing more digits,
 | |
| not in the way the algorithm works.  This confirms previous observations using __NTL high-precision types.
 | |
| 
 | |
| [include root_comparison_tables_msvc.qbk]
 | |
| [include root_comparison_tables_gcc.qbk]
 | |
| 
 | |
| [endsect] [/section:cbrt_comparison  Comparison of Cube Root Finding Algorithms]
 | |
| 
 | |
| [section:root_n_comparison Comparison of Nth-root Finding Algorithms]
 | |
| 
 | |
| A second example compares four generalized nth-root finding algorithms for various n-th roots (5, 7 and 13)
 | |
| of a single value 28.0, for four floating-point types, `float`, `double`,
 | |
| `long double` and a __multiprecision type `cpp_bin_float_50`.
 | |
| In each case the target accuracy was set using our "recomended" accuracy limits 
 | |
| (or at least limits that make a good starting point - which is likely to give
 | |
| close to full accuracy without resorting to unnecessary iterations).
 | |
| 
 | |
| [table
 | |
| [[Function][Precision Requested]]
 | |
| [[TOMS748][numeric_limits<T>::digits - 2]]
 | |
| [[Newton][floor(numeric_limits<T>::digits * 0.6)]]
 | |
| [[Halley][floor(numeric_limits<T>::digits * 0.4)]]
 | |
| [[Schr'''ö'''der][floor(numeric_limits<T>::digits * 0.4)]]
 | |
| ]
 | |
| Tests used Microsoft Visual Studio 2013 (Update 1) and GCC 4.9.1 using source code
 | |
| [@../../example/root_n_finding_algorithms.cpp root_n_finding_algorithms.cpp].
 | |
| 
 | |
| The timing uncertainty (especially using MSVC) is at least 5% of normalized time 'Norm'.
 | |
| 
 | |
| To pick out the 'best' and 'worst' algorithms are highlighted in blue and red.
 | |
| More than one result can be 'best' when normalized times are indistinguishable
 | |
| within the uncertainty.
 | |
| 
 | |
| [/include roots_table_100_msvc.qbk]
 | |
| [/include roots_table_75_msvc.qbk]
 | |
| 
 | |
| [/include roots_table_75_msvc_X86.qbk]
 | |
| [/include roots_table_100_msvc_X86.qbk]
 | |
| 
 | |
| [/include roots_table_100_msvc_AVX.qbk]
 | |
| [/include roots_table_75_msvc_AVX.qbk]
 | |
| 
 | |
| [/include roots_table_75_msvc_X86_SSE2.qbk]
 | |
| [/include roots_table_100_msvc_X86_SSE2.qbk]
 | |
| 
 | |
| [/include roots_table_100_gcc_X64_SSE2.qbk]
 | |
| [/include roots_table_75_gcc_X64_SSE2.qbk]
 | |
| 
 | |
| [/include type_info_table_100_msvc.qbk]
 | |
| [/include type_info_table_75_msvc.qbk]
 | |
| 
 | |
| [include roots_table_100_msvc_X86_SSE2.qbk]
 | |
| [include roots_table_100_msvc_X64_AVX.qbk]
 | |
| [include roots_table_100_gcc_X64_SSE2.qbk]
 | |
| 
 | |
| Some tentative conclusions can be drawn from this limited exercise.
 | |
| 
 | |
| * Perhaps surprisingly, there is little difference between the various algorithms for __fundamental_types floating-point types.
 | |
| Using the first derivatives (__newton) is usually the best, but while the improvement over the no-derivative
 | |
| __TOMS748 is considerable in number of iterations, but little in execution time.  This reflects the fact that the function
 | |
| we are finding the root for is trivial to evaluate, so runtimetimes are dominated by the time taken by the boilerplate code
 | |
| in each method.
 | |
| 
 | |
| * The extra cost of evaluating the second derivatives (__halley or __schroder) is usually too much for any net benefit:
 | |
| as with the cube root, these functors are so cheap to evaluate that the runtime is largely dominated by the
 | |
| complexity of the root finding method.
 | |
| 
 | |
| * For a __multiprecision floating-point type, the __newton is a clear winner with a several-fold gain over __TOMS748,
 | |
| and again no improvement from the second-derivative algorithms.
 | |
| 
 | |
| * The run-time of 50 decimal-digit __multiprecision is about 30-fold greater than `double`.
 | |
| 
 | |
| * The column 'dis' showing the number of bits distance from the correct result.
 | |
| The Newton-Raphson algorithm shows a bit or two better accuracy than __TOMS748.
 | |
| 
 | |
| * The goodness of the 'guess' is especially crucial for __multiprecision.
 | |
| Separate experiments show that evaluating the 'guess' using `double` allows
 | |
| convergence to the final exact result in one or two iterations.
 | |
| So in this contrived example, crudely dividing the exponent by N for a 'guess',
 | |
| it would be far better to use a `pow<double>` or ,
 | |
| if more precise `pow<long double>`, function to estimate a 'guess'.
 | |
| The limitation of this tactic is that the range of possible (exponent) values may be less than the multiprecision type.
 | |
| 
 | |
| * Using floating-point extension __SSE2 made a modest ten-percent speedup.
 | |
| 
 | |
| *Using MSVC, there was some improvement using 64-bit, markedly for __multiprecision.
 | |
| 
 | |
| * The GCC compiler 4.9.1 using 64-bit was at least five-folder faster that 32-bit,
 | |
| apparently reflecting better optimization.
 | |
| 
 | |
| Clearly, your mileage [*will vary], but in summary, __newton seems the first choice of algorithm,
 | |
| and effort to find a good 'guess' the first speed-up target, especially for __multiprecision.
 | |
| And of course, compiler optimisation is crucial for speed.
 | |
| 
 | |
| [endsect] [/section:root_n_comparison Comparison of Nth-root Finding Algorithms]
 | |
| 
 | |
| [section:elliptic_comparison Comparison of Elliptic Integral Root Finding Algoritghms]
 | |
| 
 | |
| A second example compares four root finding algorithms for locating
 | |
| the second radius of an ellipse with first radius 28 and arc length 300, 
 | |
| for four floating-point types, `float`, `double`,
 | |
| `long double` and a __multiprecision type `cpp_bin_float_50`.
 | |
| 
 | |
| Which is to say we're solving:
 | |
| 
 | |
| [pre 4xE(sqrt(1 - 28[super 2] / x[super 2])) - 300 = 0]
 | |
| 
 | |
| In each case the target accuracy was set using our "recomended" accuracy limits 
 | |
| (or at least limits that make a good starting point - which is likely to give
 | |
| close to full accuracy without resorting to unnecessary iterations).
 | |
| 
 | |
| [table
 | |
| [[Function][Precision Requested]]
 | |
| [[TOMS748][numeric_limits<T>::digits - 2]]
 | |
| [[Newton][floor(numeric_limits<T>::digits * 0.6)]]
 | |
| [[Halley][floor(numeric_limits<T>::digits * 0.4)]]
 | |
| [[Schr'''ö'''der][floor(numeric_limits<T>::digits * 0.4)]]
 | |
| ]
 | |
| Tests used Microsoft Visual Studio 2013 (Update 1) and GCC 4.9.1 using source code
 | |
| [@../../example/root_elliptic_finding.cpp root_elliptic_finding.cpp].
 | |
| 
 | |
| The timing uncertainty (especially using MSVC) is at least 5% of normalized time 'Norm'.
 | |
| 
 | |
| To pick out the 'best' and 'worst' algorithms are highlighted in blue and red.
 | |
| More than one result can be 'best' when normalized times are indistinguishable
 | |
| within the uncertainty.
 | |
| 
 | |
| [include elliptic_table_100_msvc_X86_SSE2.qbk]
 | |
| [include elliptic_table_100_msvc_X64_AVX.qbk]
 | |
| [include elliptic_table_100_gcc_X64_SSE2.qbk]
 | |
| 
 | |
| Remarks:
 | |
| 
 | |
| * The function being solved is now moderately expensive to call, and twice as expensive to call
 | |
| when obtaining the derivative than when not.  Consequently there is very little improvement in moving
 | |
| from a derivative free method, to Newton iteration.  However, once you've calculated the first derivative
 | |
| the second comes almost for free, consequently the third order methods (Halley) does much the best.
 | |
| * Of the two second order methods, Halley does best as would be expected: the Schroder method offers better
 | |
| guarantees of ['quadratic] convergence, while Halley relies on a smooth function with a single root to
 | |
| give ['cubic] convergence.  It's not entirely clear why Schroder iteration often does worse than Newton.
 | |
| 
 | |
| [endsect][/section:elliptic_comparison Comparison of Elliptic Integral Root Finding Algoritghms]
 | |
| 
 | |
| [endsect] [/section:root_comparison  Comparison of Root Finding Algorithms]
 | |
| 
 |