Can't find juggernaut on list of models to download in Comfy UI manager
comfy
Incredibly long startup time when running 70b models via vllm
cognitivecomputations/dolphin-2.9.1-llama-3-70b
. I find it even weirder that the request ultimately succeeds. Logs and screenshot of the endpoint and template config are attached - if anyone can spot an issue or knows how to deploy 70b models such that they reliably work I would greatly appreciate it.
Some other observations:
- in support, someone told me that I need to manually set the env BASE_PATH=/workspace
, which I am now always doing
- I sometimes but not always see this in the logs: AsyncEngineArgs(model='facebook/opt-125m', served_model_name=None, tokenizer='facebook/opt-125m'...
, even though I am deploying a completely different model...
Mounting network storage at runtime - serverless
Serverless fails when workers arent manually set to active
Chat completion (template) not working with VLLM 0.6.3 + Serverless
qwen2.5 vllm openwebui
Rope scaling JSON not working
First attempt at serverless endpoint - "Initializing" for a long time
(Flux) Serverless inference crashes without logs.
Same request running twice
Why is 125M from facebook loading into VLLM quickdeploy even though another model is specified?
serverless workers idle but multiple requests still in the queue
Question about serverless vllm endpoint
Serverless pod tasks stay "IN_QUEUE" forever
CMD ["python", "-u", "runpod.py"]
CMD ["python", "-u", "runpod.py"]
not getting any serverless logs using runpod==1.6.2
Add Docker credentials to Template (Python code)
Format of video input for vLLM model LLaVA-NeXT-Video-7B-hf
How to view monthly bills for each serverless instance?
