Runpod•4mo ago

Config.yaml - invalid dataset format?

Having trouble running axolotl train config.yaml to fine tune mistral v1 with my own data. Getting returned a lot of nonsense errors but AI feedback focuses a lot on my dataset formatting being incorrect. Currently, I have it like this: datasets: - path: vitalune/business-assistant-ai-tools type: field_instruction: prompt field_output: response prompt_template: mistral My dataset jsonl file is formatted like so: {"prompt": "...", "response": "..."} So I'm not sure what the problem is with my formatting, or what I should do.

4 Replies

Unknown User•4mo ago

Message Not Public

AmirOP•4mo ago

Looks like I got it working, and the error was related to sample packing, not the formatting. Ran into another one though: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU 0 has a total capacity of 47.32 GiB of which 426.50 MiB is free. Process 2997 536 has 46.90 GiB memory in use. Of the allocated memory 45.86 GiB is allocated by PyTorch, and 733.85 MiB is reserved by PyTorch but unallocated. If reser ved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Mana gement

Unknown User•4mo ago

Message Not Public

AmirOP•4mo ago

I halved the micro_batch_size variable from 2 to 1, and doubled gradient_accumulation_steps from 4 to 8. Retrying now

Gaming

Programming

Config.yaml - invalid dataset format?

Did you find this page helpful?