splitting vectorize and parallelize
My first approach when optimizing a single loop is to apply vectorize. Now I wonder if it in some cases makes sense to transform the single loop into a nested loop, vectorizing the inner loop and parallelize the outer
Instead of vectorizing
`
using
and then vectorize over j and parallize over k.
I f it makes sense, how to find a good balance between vectorize and parallize. In my concrete example, i have a loop of around 120 million .... (updating parameters in llm.mojo)
What i also wonder in this regard if the compiler is detecting these optimizations anyway so better to keep the code simple and let the compiler do these type of standard optimization.
Thanks
Instead of vectorizing
`
using
and then vectorize over j and parallize over k.
I f it makes sense, how to find a good balance between vectorize and parallize. In my concrete example, i have a loop of around 120 million .... (updating parameters in llm.mojo)
What i also wonder in this regard if the compiler is detecting these optimizations anyway so better to keep the code simple and let the compiler do these type of standard optimization.
Thanks
