Modular•13mo ago

vectorize changes the result of float operations

I just discovered that vectorize can change the result of floating operations. As @Maxim rightly pointed out to me, with floating point operations you can not expect (a+b)+c = a + (b+c)

from algorithm import vectorize

alias dtype=DType.float32
alias SIMD_WIDTH = 2*simdwidthof[dtype]()

alias NUM = 32
fn main():

    var v = DTypePointer[dtype]().alloc(NUM)

    for i in range(NUM):
        v[i] = i*0.2932
    
    fn f1() -> Float32:
        var val:Float32 = 0.0
        for i in range(NUM):
            val += v[i]
        return val

    fn f2() -> Float32:
        var val:Float32 = 0.0
        @parameter
        fn _op[width: Int](iv: Int):
            for j in range(width):
                val += v[iv+j]
        vectorize[_op, SIMD_WIDTH](size=NUM)
       
        return val

    fn f3() -> Float32:
        var val:Float32 = 0.0

        @parameter
        fn _op[width: Int](iv: Int):
            for j in range(width):
                val += v[iv+width-j-1]
        vectorize[_op, SIMD_WIDTH](size=NUM)
       
        return val

    fn f4() -> Float32:
        var val:Float32 = 0.0
        for i in range(NUM):
            val += v[NUM-i-1]
        return val

    fn f5() -> Float32:
        var val:Float32 = 0.0
        @parameter
        fn _op[width: Int](iv: Int):
            val += v.load[width=width](iv).reduce_add[1]()
        vectorize[_op, SIMD_WIDTH](size=NUM)
       
        return val

    print("f1:",f1())
    print("f2:",f2(),"\n")
    print("f3:",f3(),"\n")
    print("f4:",f4())
    print("f5:",f5())

from algorithm import vectorize

alias dtype=DType.float32
alias SIMD_WIDTH = 2*simdwidthof[dtype]()

alias NUM = 32
fn main():

    var v = DTypePointer[dtype]().alloc(NUM)

    for i in range(NUM):
        v[i] = i*0.2932
    
    fn f1() -> Float32:
        var val:Float32 = 0.0
        for i in range(NUM):
            val += v[i]
        return val

    fn f2() -> Float32:
        var val:Float32 = 0.0
        @parameter
        fn _op[width: Int](iv: Int):
            for j in range(width):
                val += v[iv+j]
        vectorize[_op, SIMD_WIDTH](size=NUM)
       
        return val

    fn f3() -> Float32:
        var val:Float32 = 0.0

        @parameter
        fn _op[width: Int](iv: Int):
            for j in range(width):
                val += v[iv+width-j-1]
        vectorize[_op, SIMD_WIDTH](size=NUM)
       
        return val

    fn f4() -> Float32:
        var val:Float32 = 0.0
        for i in range(NUM):
            val += v[NUM-i-1]
        return val

    fn f5() -> Float32:
        var val:Float32 = 0.0
        @parameter
        fn _op[width: Int](iv: Int):
            val += v.load[width=width](iv).reduce_add[1]()
        vectorize[_op, SIMD_WIDTH](size=NUM)
       
        return val

    print("f1:",f1())
    print("f2:",f2(),"\n")
    print("f3:",f3(),"\n")
    print("f4:",f4())
    print("f5:",f5())

output:

f1: 145.42720031738281
f2: 145.42720031738281 

f3: 145.42721557617188 

f4: 145.42718505859375
f5: 145.42718505859375

f1: 145.42720031738281
f2: 145.42720031738281 

f3: 145.42721557617188 

f4: 145.42718505859375
f5: 145.42718505859375

so we have three different results of operations which theoretically should get the same result. Now I wonder how to deal with this, these small drifts can have significant effects of course. (in my case with llm.mojo, it produces different texts than llm.c )

5 Replies

Ehsan M. Kermani (Modular)•13mo ago

Good question! floating point numbers in Mojo already comply with IEEE 754-2008 that means they are not associative (as you pointed out) and it's the same for all programming languages that comply with such protocol. So when it comes down to comparing float values they shouldn't be compared as exact values and use approximate comparison with a tolerance. You can use assert_almost_equal https://docs.modular.com/mojo/stdlib/testing/testing#assert_almost_equal

testing | Modular Docs

Implements various testing utils.

Ehsan M. Kermani (Modular)•13mo ago

If interested there's another aspect of floats like 0.1 + 0.2 != 0.3. Have a look at https://0.30000000000000004.com/ Also check out this important paper https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf

Martin DudekOP•13mo ago

Thanks Ehsaan. What surprises me in the demo is that f4 which sums up the number in reverse order gets the same result as the standard vectorized version f5. My expectation would have been that f1 leads to the same result. So it seems that vectorize internally reverses the calculation

Maxim•13mo ago

It could be a coincidence based on the numbers (step) you chose. I think to be sure you need to perform multiple tests with random numbers.

Martin DudekOP•13mo ago

good point, you are right, with 42 the reverse order sum is same as the standard loop, and the vectorized one is different than all the others. Trying different numbers makes clear that there is no obvious rule when which version calculates the same result.

Gaming

Programming

vectorize changes the result of float operations

Did you find this page helpful?