How to deploy Llama3 on Aphrodite Engine (RunPod)

I have setup the following settings for a pod with 48 GB RAM.
1) I'm not sure how to enable Q4 cache otherwise the 5.0bpw won't fit. Any advice please? (See attached)
2) I get an error config.json can't be found, It seems like the REVISION variable has not been taken into account. Based on the docs it says:

REVISION: The HuggingFace branch name, it defaults to the main branch.

I think that's a bug.

2024-05-06T19:18:00.996748225Z Starting Aphrodite Engine API server...
2024-05-06T19:18:00.996849854Z + exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /tmp/hub --model turboderp/Llama-3-70B-Instruct-exl2 --revision 5.0bpw --kv-cache-dtype fp8_e5m2 --gpu-memory-utilization 1.0 --enforce-eager --max-log-len 0
2024-05-06T19:18:03.870671479Z Traceback (most recent call last):
2024-05-06T19:18:03.870689139Z   File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
2024-05-06T19:18:03.870694559Z     response.raise_for_status()
2024-05-06T19:18:03.870698449Z   File "/usr/local/lib/python3.10/dist-Packages/requests/models.py", line 1021, in raise_for_status
2024-05-06T19:18:03.870701649Z     raise HTTPError(http_error_msg, response=self)
2024-05-06T19:18:03.870705949Z requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/turboderp/Llama-3-70B-Instruct-exl2/resolve/main/config.json
2024-05-06T19:18:03.870710549Z 
Screenshot_2024-05-06_at_17.26.23.png
Solution
Sure, I just made A PR. Please have a look:
https://github.com/PygmalionAI/aphrodite-engine/pull/455

Do you think you could cherry pick this fix for RunPod?
Was this page helpful?