What is the fastest way to do matrix multiplication in c#, still Intel MKL, or would you suggest ILGPU or managedCuda