ModularM
Modular2y ago
10 replies
Martin Dudek

splitting vectorize and parallelize

My first approach when optimizing a single loop is to apply vectorize. Now I wonder if it in some cases makes sense to transform the single loop into a nested loop, vectorizing the inner loop and parallelize the outer

Instead of vectorizing

for i in range(12):
  ...
`

using

for k in range(4):
    for j in range(3):
       var i = 3*k + j
        ...

and then vectorize over j and parallize over k.

I f it makes sense, how to find a good balance between vectorize and parallize. In my concrete example, i have a loop of around 120 million .... (updating parameters in llm.mojo)

What i also wonder in this regard if the compiler is detecting these optimizations anyway so better to keep the code simple and let the compiler do these type of standard optimization.

Thanks
Was this page helpful?