Runpod•2y ago

How to deploy Llama3 on Aphrodite Engine (RunPod)

I have setup the following settings for a pod with 48 GB RAM.
1) I'm not sure how to enable Q4 cache otherwise the 5.0bpw won't fit. Any advice please? (See attached)
2) I get an error config.json can't be found, It seems like the REVISION variable has not been taken into account. Based on the docs it says:

REVISION: The HuggingFace branch name, it defaults to the main branch.

I think that's a bug.

2024-05-06T19:18:00.996748225Z Starting Aphrodite Engine API server...
2024-05-06T19:18:00.996849854Z + exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /tmp/hub --model turboderp/Llama-3-70B-Instruct-exl2 --revision 5.0bpw --kv-cache-dtype fp8_e5m2 --gpu-memory-utilization 1.0 --enforce-eager --max-log-len 0
2024-05-06T19:18:03.870671479Z Traceback (most recent call last):
2024-05-06T19:18:03.870689139Z   File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
2024-05-06T19:18:03.870694559Z     response.raise_for_status()
2024-05-06T19:18:03.870698449Z   File "/usr/local/lib/python3.10/dist-Packages/requests/models.py", line 1021, in raise_for_status
2024-05-06T19:18:03.870701649Z     raise HTTPError(http_error_msg, response=self)
2024-05-06T19:18:03.870705949Z requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/turboderp/Llama-3-70B-Instruct-exl2/resolve/main/config.json
2024-05-06T19:18:03.870710549Z

2024-05-06T19:18:00.996748225Z Starting Aphrodite Engine API server...
2024-05-06T19:18:00.996849854Z + exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /tmp/hub --model turboderp/Llama-3-70B-Instruct-exl2 --revision 5.0bpw --kv-cache-dtype fp8_e5m2 --gpu-memory-utilization 1.0 --enforce-eager --max-log-len 0
2024-05-06T19:18:03.870671479Z Traceback (most recent call last):
2024-05-06T19:18:03.870689139Z   File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
2024-05-06T19:18:03.870694559Z     response.raise_for_status()
2024-05-06T19:18:03.870698449Z   File "/usr/local/lib/python3.10/dist-Packages/requests/models.py", line 1021, in raise_for_status
2024-05-06T19:18:03.870701649Z     raise HTTPError(http_error_msg, response=self)
2024-05-06T19:18:03.870705949Z requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/turboderp/Llama-3-70B-Instruct-exl2/resolve/main/config.json
2024-05-06T19:18:03.870710549Z

2024-05-06T19:18:00.996748225Z Starting Aphrodite Engine API server...
2024-05-06T19:18:00.996849854Z + exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /tmp/hub --model turboderp/Llama-3-70B-Instruct-exl2 --revision 5.0bpw --kv-cache-dtype fp8_e5m2 --gpu-memory-utilization 1.0 --enforce-eager --max-log-len 0
2024-05-06T19:18:03.870671479Z Traceback (most recent call last):
2024-05-06T19:18:03.870689139Z   File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
2024-05-06T19:18:03.870694559Z     response.raise_for_status()
2024-05-06T19:18:03.870698449Z   File "/usr/local/lib/python3.10/dist-Packages/requests/models.py", line 1021, in raise_for_status
2024-05-06T19:18:03.870701649Z     raise HTTPError(http_error_msg, response=self)
2024-05-06T19:18:03.870705949Z requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/turboderp/Llama-3-70B-Instruct-exl2/resolve/main/config.json
2024-05-06T19:18:03.870710549Z

2024-05-06T19:18:00.996748225Z Starting Aphrodite Engine API server...
2024-05-06T19:18:00.996849854Z + exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /tmp/hub --model turboderp/Llama-3-70B-Instruct-exl2 --revision 5.0bpw --kv-cache-dtype fp8_e5m2 --gpu-memory-utilization 1.0 --enforce-eager --max-log-len 0
2024-05-06T19:18:03.870671479Z Traceback (most recent call last):
2024-05-06T19:18:03.870689139Z   File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
2024-05-06T19:18:03.870694559Z     response.raise_for_status()
2024-05-06T19:18:03.870698449Z   File "/usr/local/lib/python3.10/dist-Packages/requests/models.py", line 1021, in raise_for_status
2024-05-06T19:18:03.870701649Z     raise HTTPError(http_error_msg, response=self)
2024-05-06T19:18:03.870705949Z requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/turboderp/Llama-3-70B-Instruct-exl2/resolve/main/config.json
2024-05-06T19:18:03.870710549Z

Solution

Sure, I just made A PR. Please have a look:
https://github.com/PygmalionAI/aphrodite-engine/pull/455

Do you think you could cherry pick this fix for RunPod?

Jump to solution

Jason•5/6/24, 11:45 PM

or if its a bug i suggest just make an issue to the github of the template

HermannOP•5/7/24, 7:19 AM

Ah yes, there is already an issue about this and patch has been proposed:
https://github.com/PygmalionAI/aphrodite-engine/issues/318#issuecomment-2088895545

Do you see how simple the fix is?

They just forgot to add this parameter revision=self.model_config.revisionrevision=self.model_config.revision
The patch can be applied to the latest release v0.5.2v0.5.2. Do you think you could do that? That would be amazing.

Jason•5/7/24, 7:38 AM

Do a quick pr if you're sure for that, sorry I don't want to do that

Jason•5/7/24, 7:38 AM

Create a fork, of the github repo

Jason•5/7/24, 7:38 AM

Edit your own fork ( using web editor )

Jason•5/7/24, 7:38 AM

Then create a pr

Solution

Hermann•5/7/24, 9:09 AM

Sure, I just made A PR. Please have a look:
https://github.com/PygmalionAI/aphrodite-engine/pull/455

Do you think you could cherry pick this fix for RunPod?

HermannOP•5/7/24, 9:10 AM

@nerdylive

HermannOP•5/7/24, 9:10 AM

Thanks

HHermann Sure, I just made A PR. Please have a look: https://github.com/PygmalionAI/aphro...

Jason•5/7/24, 9:22 AM

Screenshot_2024-05-07-16-22-09-577_com.android.chrome.jpg

HermannOP•5/7/24, 9:22 AM

Ah hold on, it seems someone replied to rebase it to dev.

Jason•5/7/24, 9:22 AM

Try to do that and they will merge your pr

HermannOP•5/7/24, 9:22 AM

Ah is that you? LOL

Jason•5/7/24, 9:22 AM

No no

Jason•5/7/24, 9:22 AM

That's not me

HermannOP•5/7/24, 9:22 AM

sure, what a timing

Jason•5/7/24, 9:23 AM

Yeah I just happened to open that just now

HermannOP•5/7/24, 9:23 AM

cool. I'll do it now. Thanks

Jason•5/7/24, 9:23 AM

Yep np

JJason Click to see attachment

Jason•5/7/24, 9:23 AM

Look at the cool green button at my screen, that doesn't look like I just replied hahah

HermannOP•5/7/24, 9:25 AM

yes

I was joking

Jason•5/7/24, 10:16 AM

same

Jason•5/7/24, 10:16 AM

all gud now on the engine?

HermannOP•5/7/24, 10:29 AM

Ok, he just approved and merged it to Dev branch. Now we need to wait until he makes a release.

HermannOP•5/7/24, 10:29 AM

It's ok. Hopefully he can do it in the next 2 weeks, before we go live.

Jason•5/7/24, 1:52 PM

nice

How to deploy Llama3 on Aphrodite Engine (RunPod)

Similar Threads

Similar Threads

Similar Threads