Summing large lists of floating point numbers in Python¶
Adding a large list of numbers directly may not be accurate as the floating point errors accumulate. The simplest way to see this is to choose an array a
of size $N+1$, where the first entry is large and the remainder are $1$.
If the first entry is large enough then sum(a) - a[0]
will not be N
.
import numpy as np, math
N = 100
a = np.ones( N+1 )
# Choose a[0] large enough so that (a[0] + 1) - a[0] = 0
# This way we will certainly see cancelation errors when we sum the array a
a[0] = 1e16
(a[0] + 1) - a[0]
np.float64(0.0)
# Directly check sum(a) - a[0] is not N
sum(a) - a[0]
np.float64(0.0)
# Just in case python sum is doing some magic, let's run a for loop and check exactly
s = 0
for t in a: s += t
s - a[0]
np.float64(0.0)
numpy.sum()¶
numpy.sum uses a partial pairwise summation algorithm that is fast and more accurate than directly summing.
# Numpy sum uses a fast pairwise sum; it's fast and more accurate than directly summing
np.sum(a) - a[0]
np.float64(84.0)
math.fsum( a ) - a[0]
np.float64(100.0)
Errors can't be completely eliminated.¶
It's easy to see that no matter what algorithm is used, errors can't be completely eliminated.
So even math.fsum()
will fail if we increase a[0]
enough. One case where it will fail is if the next representable float after a[0]
is larger than a[0] + N
. In Python, the next representable float can be found using np.nextafter()
x = 1e19
np.nextafter( x, np.inf ) - x
np.float64(2048.0)
Let's confirm that if we choose a[0] = x
, then even the more accurate math.fsum()
will fail.
a[0] = x
math.fsum( a ) - a[0] # Gives 0, even for the more accurate algorithm
np.float64(0.0)
In addition to these, another algorithm is compensated summation (also called Kahan summation). The algorithms are all simple to implement if you want to play with them.