Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Performance Tuning Macros

There are a small number of performance tuning options that are determined by configuration macros. These should be set in boost/math/tools/user.hpp; or else reported to the Boost-development mailing list so that the appropriate option for a given compiler and OS platform can be set automatically in our configuration setup.

Macro

Meaning

BOOST_MATH_POLY_METHOD

Determines how polynomials and most rational functions are evaluated. Define to one of the values 0, 1, 2 or 3: see below for the meaning of these values.

BOOST_MATH_RATIONAL_METHOD

Determines how symmetrical rational functions are evaluated: mostly this only effects how the Lanczos approximation is evaluated, and how the evaluate_rational function behaves. Define to one of the values 0, 1, 2 or 3: see below for the meaning of these values.

BOOST_MATH_MAX_POLY_ORDER

The maximum order of polynomial or rational function that will be evaluated by a method other than 0 (a simple "for" loop).

BOOST_MATH_INT_TABLE_TYPE(RT, IT)

Many of the coefficients to the polynomials and rational functions used by this library are integers. Normally these are stored as tables as integers, but if mixed integer / floating point arithmetic is much slower than regular floating point arithmetic then they can be stored as tables of floating point values instead. If mixed arithmetic is slow then add:

#define BOOST_MATH_INT_TABLE_TYPE(RT, IT) RT

to boost/math/tools/user.hpp, otherwise the default of:

#define BOOST_MATH_INT_TABLE_TYPE(RT, IT) IT

Set in boost/math/config.hpp is fine, and may well result in smaller code.

The values to which BOOST_MATH_POLY_METHOD and BOOST_MATH_RATIONAL_METHOD may be set are as follows:

Value

Effect

0

The polynomial or rational function is evaluated using Horner's method, and a simple for-loop.

Note that if the order of the polynomial or rational function is a runtime parameter, or the order is greater than the value of BOOST_MATH_MAX_POLY_ORDER, then this method is always used, irrespective of the value of BOOST_MATH_POLY_METHOD or BOOST_MATH_RATIONAL_METHOD.

1

The polynomial or rational function is evaluated without the use of a loop, and using Horner's method. This only occurs if the order of the polynomial is known at compile time and is less than or equal to BOOST_MATH_MAX_POLY_ORDER.

2

The polynomial or rational function is evaluated without the use of a loop, and using a second order Horner's method. In theory this permits two operations to occur in parallel for polynomials, and four in parallel for rational functions. This only occurs if the order of the polynomial is known at compile time and is less than or equal to BOOST_MATH_MAX_POLY_ORDER.

3

The polynomial or rational function is evaluated without the use of a loop, and using a second order Horner's method. In theory this permits two operations to occur in parallel for polynomials, and four in parallel for rational functions. This differs from method "2" in that the code is carefully ordered to make the parallelisation more obvious to the compiler: rather than relying on the compiler's optimiser to spot the parallelisation opportunities. This only occurs if the order of the polynomial is known at compile time and is less than or equal to BOOST_MATH_MAX_POLY_ORDER.

The performance test suite generates a report for your particular compiler showing which method is likely to work best, the following tables show the results for MSVC-14.0 and GCC-5.1.0 (Linux). There's not much to choose between the various methods, but generally loop-unrolled methods perform better. Interestingly, ordering the code to try and "second guess" possible optimizations seems not to be such a good idea (method 3 below).

Table 16.3. Polynomial Method Comparison with Microsoft Visual C++ version 14.0 on Windows x64

Function

Method 0
(Double Coefficients)

Method 0
(Integer Coefficients)

Method 1
(Double Coefficients)

Method 1
(Integer Coefficients)

Method 2
(Double Coefficients)

Method 2
(Integer Coefficients)

Method 3
(Double Coefficients)

Method 3
(Integer Coefficients)

Order 2

-

-

1.00
(9ns)

1.00
(9ns)

1.00
(9ns)

1.00
(9ns)

1.00
(9ns)

1.00
(9ns)

Order 3

2.08
(25ns)

2.75
(33ns)

1.08
(13ns)

1.08
(13ns)

1.08
(13ns)

1.08
(13ns)

1.08
(13ns)

1.00
(12ns)

Order 4

2.06
(35ns)

2.71
(46ns)

1.06
(18ns)

1.00
(17ns)

1.06
(18ns)

1.06
(18ns)

1.00
(17ns)

1.00
(17ns)

Order 5

1.32
(29ns)

2.00
(44ns)

1.00
(22ns)

1.00
(22ns)

1.05
(23ns)

1.05
(23ns)

1.05
(23ns)

1.05
(23ns)

Order 6

1.38
(36ns)

2.04
(53ns)

1.08
(28ns)

1.00
(26ns)

1.08
(28ns)

1.08
(28ns)

1.35
(35ns)

1.38
(36ns)

Order 7

1.43
(43ns)

2.13
(64ns)

1.03
(31ns)

1.00
(30ns)

1.10
(33ns)

1.03
(31ns)

1.10
(33ns)

1.13
(34ns)

Order 8

1.65
(61ns)

2.22
(82ns)

1.00
(37ns)

1.08
(40ns)

1.14
(42ns)

1.05
(39ns)

1.08
(40ns)

1.11
(41ns)

Order 9

1.39
(57ns)

2.05
(84ns)

1.17
(48ns)

1.17
(48ns)

1.00
(41ns)

1.05
(43ns)

1.15
(47ns)

1.12
(46ns)

Order 10

1.37
(63ns)

2.20
(101ns)

1.22
(56ns)

1.24
(57ns)

1.00
(46ns)

1.00
(46ns)

1.17
(54ns)

1.17
(54ns)

Order 11

1.59
(78ns)

2.24
(110ns)

1.37
(67ns)

1.29
(63ns)

1.22
(60ns)

1.00
(49ns)

1.22
(60ns)

1.22
(60ns)

Order 12

1.46
(83ns)

2.16
(123ns)

1.28
(73ns)

1.26
(72ns)

1.02
(58ns)

1.00
(57ns)

1.07
(61ns)

1.05
(60ns)

Order 13

1.61
(90ns)

2.55
(143ns)

1.32
(74ns)

1.39
(78ns)

1.04
(58ns)

1.00
(56ns)

1.11
(62ns)

1.07
(60ns)

Order 14

1.61
(106ns)

2.23
(147ns)

1.45
(96ns)

1.45
(96ns)

1.02
(67ns)

1.02
(67ns)

1.00
(66ns)

1.09
(72ns)

Order 15

1.49
(119ns)

2.10
(168ns)

1.35
(108ns)

1.35
(108ns)

1.00
(80ns)

1.00
(80ns)

1.00
(80ns)

1.02
(82ns)

Order 16

1.54
(129ns)

1.99
(167ns)

1.49
(125ns)

1.45
(122ns)

1.07
(90ns)

1.00
(84ns)

1.08
(91ns)

1.02
(86ns)

Order 17

1.51
(133ns)

2.02
(178ns)

1.57
(138ns)

1.50
(132ns)

1.02
(90ns)

1.00
(88ns)

1.07
(94ns)

1.06
(93ns)

Order 18

1.53
(148ns)

2.16
(210ns)

1.49
(145ns)

1.57
(152ns)

1.11
(108ns)

1.09
(106ns)

1.00
(97ns)

1.08
(105ns)

Order 19

1.90
(194ns)

2.27
(232ns)

1.62
(165ns)

1.62
(165ns)

1.08
(110ns)

1.00
(102ns)

1.17
(119ns)

1.19
(121ns)

Order 20

1.65
(206ns)

2.08
(260ns)

1.45
(181ns)

1.44
(180ns)

1.00
(125ns)

1.00
(125ns)

1.01
(126ns)

1.03
(129ns)


Table 16.4. Rational Method Comparison with Microsoft Visual C++ version 14.0 on Windows x64

Function

Method 0
(Double Coefficients)

Method 0
(Integer Coefficients)

Method 1
(Double Coefficients)

Method 1
(Integer Coefficients)

Method 2
(Double Coefficients)

Method 2
(Integer Coefficients)

Method 3
(Double Coefficients)

Method 3
(Integer Coefficients)

Order 2

-

-

2.12
(89ns)

1.95
(82ns)

1.00
(42ns)

1.00
(42ns)

1.00
(42ns)

1.00
(42ns)

Order 3

2.10
(88ns)

2.10
(88ns)

2.05
(86ns)

2.10
(88ns)

1.05
(44ns)

1.00
(42ns)

1.00
(42ns)

1.00
(42ns)

Order 4

2.12
(89ns)

2.21
(93ns)

1.98
(83ns)

2.10
(88ns)

1.02
(43ns)

1.02
(43ns)

1.02
(43ns)

1.00
(42ns)

Order 5

1.07
(90ns)

1.15
(97ns)

1.08
(91ns)

1.00
(84ns)

1.45
(122ns)

1.46
(123ns)

1.45
(122ns)

1.45
(122ns)

Order 6

1.16
(102ns)

1.58
(139ns)

1.00
(88ns)

1.03
(91ns)

1.44
(127ns)

1.44
(127ns)

1.41
(124ns)

1.38
(121ns)

Order 7

1.29
(121ns)

1.44
(135ns)

1.01
(95ns)

1.00
(94ns)

1.38
(130ns)

1.36
(128ns)

1.33
(125ns)

1.36
(128ns)

Order 8

1.33
(134ns)

1.52
(154ns)

1.00
(101ns)

1.08
(109ns)

1.38
(139ns)

1.31
(132ns)

1.39
(140ns)

1.37
(138ns)

Order 9

1.18
(141ns)

1.45
(172ns)

1.00
(119ns)

1.08
(128ns)

1.13
(135ns)

1.26
(150ns)

1.26
(150ns)

1.27
(151ns)

Order 10

1.29
(180ns)

1.28
(178ns)

1.05
(146ns)

1.00
(139ns)

1.06
(147ns)

1.06
(147ns)

1.18
(164ns)

1.17
(163ns)

Order 11

1.28
(187ns)

1.28
(187ns)

1.06
(155ns)

1.05
(154ns)

1.03
(151ns)

1.00
(146ns)

1.19
(174ns)

1.47
(215ns)

Order 12

1.22
(197ns)

1.38
(223ns)

1.04
(168ns)

1.04
(169ns)

1.00
(162ns)

1.04
(169ns)

1.22
(198ns)

1.52
(246ns)

Order 13

1.23
(209ns)

1.29
(220ns)

1.15
(196ns)

1.10
(187ns)

1.00
(170ns)

1.15
(196ns)

1.22
(208ns)

1.61
(273ns)

Order 14

1.28
(242ns)

1.39
(262ns)

1.15
(218ns)

1.14
(216ns)

1.00
(189ns)

1.01
(191ns)

1.49
(282ns)

1.53
(290ns)

Order 15

1.28
(260ns)

1.34
(273ns)

1.12
(227ns)

1.15
(233ns)

1.00
(203ns)

1.00
(203ns)

1.38
(280ns)

1.47
(298ns)

Order 16

1.35
(288ns)

1.40
(300ns)

1.22
(261ns)

1.18
(252ns)

1.00
(214ns)

1.23
(264ns)

1.43
(305ns)

1.52
(325ns)

Order 17

1.16
(259ns)

1.47
(328ns)

1.15
(256ns)

1.35
(302ns)

1.00
(223ns)

1.22
(273ns)

1.50
(334ns)

1.52
(339ns)

Order 18

1.10
(273ns)

1.46
(363ns)

1.10
(273ns)

1.75
(434ns)

1.00
(248ns)

1.30
(322ns)

1.41
(349ns)

1.46
(363ns)

Order 19

1.26
(330ns)

1.35
(352ns)

1.24
(324ns)

1.33
(348ns)

1.00
(261ns)

1.22
(319ns)

1.44
(377ns)

1.46
(381ns)

Order 20

1.24
(330ns)

1.60
(427ns)

1.22
(327ns)

1.56
(416ns)

1.00
(267ns)

1.19
(317ns)

1.57
(418ns)

1.56
(416ns)


[table_Polynomial_Method_Comparison_with_GNU_C_version_5_1_0_on_linux]

[table_Rational_Method_Comparison_with_GNU_C_version_5_1_0_on_linux]


PrevUpHomeNext