How to deploy Llama3 on Aphrodite Engine (RunPod)
I have setup the following settings for a pod with 48 GB RAM.
1) I'm not sure how to enable Q4 cache otherwise the 5.0bpw won't fit. Any advice please? (See attached)
2) I get an error config.json can't be found, It seems like the REVISION variable has not been taken into account. Based on the docs it says:
REVISION: The HuggingFace branch name, it defaults to the main branch.
I think that's a bug.
1) I'm not sure how to enable Q4 cache otherwise the 5.0bpw won't fit. Any advice please? (See attached)
2) I get an error config.json can't be found, It seems like the REVISION variable has not been taken into account. Based on the docs it says:
REVISION: The HuggingFace branch name, it defaults to the main branch.
I think that's a bug.

Solution
Sure, I just made A PR. Please have a look:
https://github.com/PygmalionAI/aphrodite-engine/pull/455
Do you think you could cherry pick this fix for RunPod?
https://github.com/PygmalionAI/aphrodite-engine/pull/455
Do you think you could cherry pick this fix for RunPod?
