R
Runpod4mo ago
Amir

Config.yaml - invalid dataset format?

Having trouble running axolotl train config.yaml to fine tune mistral v1 with my own data. Getting returned a lot of nonsense errors but AI feedback focuses a lot on my dataset formatting being incorrect. Currently, I have it like this: datasets: - path: vitalune/business-assistant-ai-tools type: field_instruction: prompt field_output: response prompt_template: mistral My dataset jsonl file is formatted like so: {"prompt": "...", "response": "..."} So I'm not sure what the problem is with my formatting, or what I should do.
4 Replies
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View
Amir
AmirOP4mo ago
Looks like I got it working, and the error was related to sample packing, not the formatting. Ran into another one though: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU 0 has a total capacity of 47.32 GiB of which 426.50 MiB is free. Process 2997 536 has 46.90 GiB memory in use. Of the allocated memory 45.86 GiB is allocated by PyTorch, and 733.85 MiB is reserved by PyTorch but unallocated. If reser ved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Mana gement
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View
Amir
AmirOP4mo ago
I halved the micro_batch_size variable from 2 to 1, and doubled gradient_accumulation_steps from 4 to 8. Retrying now

Did you find this page helpful?