Llama-70B 3.1 execution and queue delay time much larger than 3.0. Why?
I deployed these two model who seem to be using same techniques. I'm using same machine 2x80GB but the execution time and queue delay time has massive differences: Queue delay: Llama70B 3.0: 0.02 secs Llama70B 3.1: 0.15 secs