not that it matter the sweet spot is still around epoch 200 and lora strength 0.95-1.0
not that it matter the sweet spot is still around epoch 200 and lora strength 0.95-1.0




is not supported because:
max(query.shape[-1] != value.shape[-1]) > 128
xFormers wasn't build with CUDA support
attn_bias type is <class 'NoneType'>
operator wasn't built - see python -m xformers.info for more info
flshattF@0.0.0 is not supported because:
max(query.shape[-1] != value.shape[-1]) > 256
xFormers wasn't build with CUDA support
operator wasn't built - see python -m xformers.info for more info
cutlassF is not supported because:
xFormers wasn't build with CUDA support
operator wasn't built - see python -m xformers.info for more info
smallkF is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
xFormers wasn't build with CUDA support
dtype=torch.bfloat16 (supported: {torch.float32})
operator wasn't built - see python -m xformers.info` for more info

is not supported because:
max(query.shape[-1] != value.shape[-1]) > 128
xFormers wasn't build with CUDA support
attn_bias type is <class 'NoneType'>
operator wasn't built - see is not supported because:
max(query.shape[-1] != value.shape[-1]) > 256
xFormers wasn't build with CUDA support
operator wasn't built - see is not supported because:
xFormers wasn't build with CUDA support
operator wasn't built - see is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
xFormers wasn't build with CUDA support
dtype=torch.bfloat16 (supported: {torch.float32})
operator wasn't built - see