ModularM
Modular2y ago
6 replies
Martin Dudek

vectorize changes the result of float operations

I just discovered that vectorize can change the result of floating operations. As @Maxim rightly pointed out to me, with floating point operations you can not expect (a+b)+c = a + (b+c)

from algorithm import vectorize

alias dtype=DType.float32
alias SIMD_WIDTH = 2*simdwidthof[dtype]()

alias NUM = 32
fn main():

    var v = DTypePointer[dtype]().alloc(NUM)

    for i in range(NUM):
        v[i] = i*0.2932
    
    fn f1() -> Float32:
        var val:Float32 = 0.0
        for i in range(NUM):
            val += v[i]
        return val

    fn f2() -> Float32:
        var val:Float32 = 0.0
        @parameter
        fn _op[width: Int](iv: Int):
            for j in range(width):
                val += v[iv+j]
        vectorize[_op, SIMD_WIDTH](size=NUM)
       
        return val

    fn f3() -> Float32:
        var val:Float32 = 0.0

        @parameter
        fn _op[width: Int](iv: Int):
            for j in range(width):
                val += v[iv+width-j-1]
        vectorize[_op, SIMD_WIDTH](size=NUM)
       
        return val

    fn f4() -> Float32:
        var val:Float32 = 0.0
        for i in range(NUM):
            val += v[NUM-i-1]
        return val

    fn f5() -> Float32:
        var val:Float32 = 0.0
        @parameter
        fn _op[width: Int](iv: Int):
            val += v.load[width=width](iv).reduce_add[1]()
        vectorize[_op, SIMD_WIDTH](size=NUM)
       
        return val

    print("f1:",f1())
    print("f2:",f2(),"\n")
    print("f3:",f3(),"\n")
    print("f4:",f4())
    print("f5:",f5())

output:
f1: 145.42720031738281
f2: 145.42720031738281 

f3: 145.42721557617188 

f4: 145.42718505859375
f5: 145.42718505859375

so we have three different results of operations which theoretically should get the same result.

Now I wonder how to deal with this, these small drifts can have significant effects of course. (in my case with llm.mojo, it produces different texts than llm.c )
Was this page helpful?