Config.yaml - invalid dataset format?
Having trouble running axolotl train config.yaml to fine tune mistral v1 with my own data. Getting returned a lot of nonsense errors but AI feedback focuses a lot on my dataset formatting being incorrect. Currently, I have it like this:
datasets:
- path: vitalune/business-assistant-ai-tools
type:
field_instruction: prompt
field_output: response
prompt_template: mistral
My dataset jsonl file is formatted like so:
{"prompt": "...", "response": "..."}
So I'm not sure what the problem is with my formatting, or what I should do.
4 Replies
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
Looks like I got it working, and the error was related to sample packing, not the formatting. Ran into another one though: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU 0 has a total capacity of 47.32 GiB of which 426.50 MiB is free. Process 2997
536 has 46.90 GiB memory in use. Of the allocated memory 45.86 GiB is allocated by PyTorch, and 733.85 MiB is reserved by PyTorch but unallocated. If reser
ved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Mana
gement
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
I halved the micro_batch_size variable from 2 to 1, and doubled gradient_accumulation_steps from 4 to 8. Retrying now