Unnecessary nan-checks: performance issue or missing compile options.

I'm not sure whether this is a performance issue or a feature request. I figured lets ask here first.

The issue is a performance regression due to unnecessary nan-check for with (eg.) max and min operations.

from random import random_ui64
from time import now

fn gen_random_SIMD[T: DType, width: Int]() -> SIMD[T, width]:
    var result = SIMD[T, width]()
    for i in range(width):
        result[i] = random_ui64(0, 100).cast[T]()
    return result

fn main():
    let data0 = gen_random_SIMD[DType.float64, 8]()
    let data1 = gen_random_SIMD[DType.float64, 8]()
    
    let start_time_ns = now()
    let data2 = data0.max(data1)  # we interested in how max is handled.
    let elapsed_time_ns = now() - start_time_ns

    print(data2)
    print("Elapsed time " + str(elapsed_time_ns) + " ns")


<+278>:   call   0x5470 <clock_gettime@plt>
<+283>:   mov    rbx,QWORD PTR [rsp+0x40]
<+288>:   mov    rax,QWORD PTR [rsp+0x48]
<+293>:   mov    QWORD PTR [rsp+0x70],rax
<+298>:   vmovapd zmm0,ZMMWORD PTR [rsp+0xc0]
<+306>:   vmovapd zmm2,ZMMWORD PTR [rsp+0x100]
<+314>:   vmaxpd zmm1,zmm2,zmm0
<+320>:   vcmpunordpd k1,zmm0,zmm0
<+327>:   vmovapd zmm1{k1},zmm2
<+333>:   vmovapd ZMMWORD PTR [rsp+0xc0],zmm1
...
<+364>:   call   0x5470 <clock_gettime@plt>


+298 and +306 load data0 and data1
+314 calculates the maximum of zmm0 and zmm2 and store the result in zmm1 .
+320 mask register k1 is set when zmm0 (data0) contains nan-values.
+327 the result value (zmm1) is overwritten when the zmm0 was a nan with the value of data1 (zmm2)
+333 result value is written back to memory

If data0 could contain nan-values, the above assembly would be correct. But when data0 does not have such values, the code has a performance regression, because for every float min/max operations a nan-check is performed. This is something I would like to control in HPC AI workloads.

Q: Is this a regression bug or something else (for which i need to make a feature request)?
Was this page helpful?