I'm looking to do a finetune with a fp8 but the Dr. setting for it uses shared memory, and is slow a

I'm looking to do a finetune with a fp8 but the Dr. setting for it uses shared memory, and is slow at around 10s/it. But speaking to koyha, it seems its an intended behavior to use shared memory.
Was this page helpful?