vectorize changes the result of float operations

I just discovered that vectorize can change the result of floating operations. As @Maxim rightly pointed out to me, with floating point operations you can not expect (a+b)+c = a + (b+c)
from algorithm import vectorize

alias dtype=DType.float32
alias SIMD_WIDTH = 2*simdwidthof[dtype]()

alias NUM = 32
fn main():

var v = DTypePointer[dtype]().alloc(NUM)

for i in range(NUM):
v[i] = i*0.2932

fn f1() -> Float32:
var val:Float32 = 0.0
for i in range(NUM):
val += v[i]
return val

fn f2() -> Float32:
var val:Float32 = 0.0
@parameter
fn _op[width: Int](iv: Int):
for j in range(width):
val += v[iv+j]
vectorize[_op, SIMD_WIDTH](size=NUM)

return val

fn f3() -> Float32:
var val:Float32 = 0.0

@parameter
fn _op[width: Int](iv: Int):
for j in range(width):
val += v[iv+width-j-1]
vectorize[_op, SIMD_WIDTH](size=NUM)

return val

fn f4() -> Float32:
var val:Float32 = 0.0
for i in range(NUM):
val += v[NUM-i-1]
return val

fn f5() -> Float32:
var val:Float32 = 0.0
@parameter
fn _op[width: Int](iv: Int):
val += v.load[width=width](iv).reduce_add[1]()
vectorize[_op, SIMD_WIDTH](size=NUM)

return val

print("f1:",f1())
print("f2:",f2(),"\n")
print("f3:",f3(),"\n")
print("f4:",f4())
print("f5:",f5())
from algorithm import vectorize

alias dtype=DType.float32
alias SIMD_WIDTH = 2*simdwidthof[dtype]()

alias NUM = 32
fn main():

var v = DTypePointer[dtype]().alloc(NUM)

for i in range(NUM):
v[i] = i*0.2932

fn f1() -> Float32:
var val:Float32 = 0.0
for i in range(NUM):
val += v[i]
return val

fn f2() -> Float32:
var val:Float32 = 0.0
@parameter
fn _op[width: Int](iv: Int):
for j in range(width):
val += v[iv+j]
vectorize[_op, SIMD_WIDTH](size=NUM)

return val

fn f3() -> Float32:
var val:Float32 = 0.0

@parameter
fn _op[width: Int](iv: Int):
for j in range(width):
val += v[iv+width-j-1]
vectorize[_op, SIMD_WIDTH](size=NUM)

return val

fn f4() -> Float32:
var val:Float32 = 0.0
for i in range(NUM):
val += v[NUM-i-1]
return val

fn f5() -> Float32:
var val:Float32 = 0.0
@parameter
fn _op[width: Int](iv: Int):
val += v.load[width=width](iv).reduce_add[1]()
vectorize[_op, SIMD_WIDTH](size=NUM)

return val

print("f1:",f1())
print("f2:",f2(),"\n")
print("f3:",f3(),"\n")
print("f4:",f4())
print("f5:",f5())
output:
f1: 145.42720031738281
f2: 145.42720031738281

f3: 145.42721557617188

f4: 145.42718505859375
f5: 145.42718505859375
f1: 145.42720031738281
f2: 145.42720031738281

f3: 145.42721557617188

f4: 145.42718505859375
f5: 145.42718505859375
so we have three different results of operations which theoretically should get the same result. Now I wonder how to deal with this, these small drifts can have significant effects of course. (in my case with llm.mojo, it produces different texts than llm.c )
5 Replies
Ehsan M. Kermani (Modular)
Good question! floating point numbers in Mojo already comply with IEEE 754-2008 that means they are not associative (as you pointed out) and it's the same for all programming languages that comply with such protocol. So when it comes down to comparing float values they shouldn't be compared as exact values and use approximate comparison with a tolerance. You can use assert_almost_equal https://docs.modular.com/mojo/stdlib/testing/testing#assert_almost_equal
testing | Modular Docs
Implements various testing utils.
Ehsan M. Kermani (Modular)
If interested there's another aspect of floats like 0.1 + 0.2 != 0.3. Have a look at https://0.30000000000000004.com/ Also check out this important paper https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf
Martin Dudek
Martin Dudek4mo ago
Thanks Ehsaan. What surprises me in the demo is that f4 which sums up the number in reverse order gets the same result as the standard vectorized version f5. My expectation would have been that f1 leads to the same result. So it seems that vectorize internally reverses the calculation
Maxim
Maxim4mo ago
It could be a coincidence based on the numbers (step) you chose. I think to be sure you need to perform multiple tests with random numbers.
Martin Dudek
Martin Dudek4mo ago
good point, you are right, with 42 the reverse order sum is same as the standard loop, and the vectorized one is different than all the others. Trying different numbers makes clear that there is no obvious rule when which version calculates the same result.