RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Can't find juggernaut on list of models to download in Comfy UI manager

My workflow is deployed on runpod but i cant find my ckpt in the comfyui manager to download Error Prompt outputs failed validation Efficient Loader:...

comfy

getting message 'throttled waiting for GPU to become available' even though I have 4 endpoints selected with high and medium availability.

Incredibly long startup time when running 70b models via vllm

I have been trying to deploy 70b models as a serverless endpoint and observe start up times of almost 1 hour, if the endpoint becomes available at all. The attached screenshot shows an example of an endpoint that deploys cognitivecomputations/dolphin-2.9.1-llama-3-70b . I find it even weirder that the request ultimately succeeds. Logs and screenshot of the endpoint and template config are attached - if anyone can spot an issue or knows how to deploy 70b models such that they reliably work I would greatly appreciate it. Some other observations: - in support, someone told me that I need to manually set the env BASE_PATH=/workspace, which I am now always doing - I sometimes but not always see this in the logs: AsyncEngineArgs(model='facebook/opt-125m', served_model_name=None, tokenizer='facebook/opt-125m'..., even though I am deploying a completely different model...
No description

Mounting network storage at runtime - serverless

I am running my own docker container and at the moment, I’m using the runpod interface to select network storage which then presents at /runpod-volume This is OK, however, what I am hoping to do (instead) is mount the volume at runtime programmatically. Is this in anyway possible through libraries or API? Basically I would want to list the available volumes, and where the volume exists within the same region as the container / worker, it will mount it....

Serverless fails when workers arent manually set to active

As the title says, my requests to my serverless endpoint are retrying/failing at a much higher frequency when my workers arent set to active. Anyone experienced something like this before?

Chat completion (template) not working with VLLM 0.6.3 + Serverless

I deployed https://huggingface.co/xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k model through the Serverless UI, setting max model context window to 129024 and quantization to awq. I deploy it using the lastest version of vllm (0.6.3) provided by runpod. I ran into the following errors Client-side...

qwen2.5 vllm openwebui

I have deployed qwen2.5-7b-instruct using the vLLM quick deploy template (0.6.2). But when using openwebui connected by the OpenAI API the runpod workers log these errors: "code": 400, "message": "1 validation error for ChatCompletionRequest\nmax_completion_tokens\n Extra inputs are not permitted [type=extra_forbidden, input_value=50, input_type=int]\n For further information visit https://errors.pydantic.dev/2.9/v/extra_forbidden", "object": "error", "param": null,...

Rope scaling JSON not working

When I try to use rope scaling, with the JSON that works fine in my own vLLM... it errors out on serverless. I tried setting it to just 'type' also but this produces the same error. {"factor":4,"original_max_position_embeddings":32768,"rope_type":"yarn"} Here is the log:...

First attempt at serverless endpoint - "Initializing" for a long time

Hi. New to RunPod, trying to run a serverless endpoint with a worker based on https://github.com/blib-la/runpod-worker-comfy and not able to get it past the "Initializing" status. There are NO logs anywhere in the console Here's what I did:...

(Flux) Serverless inference crashes without logs.

Hi All! I've built a FLUX inference container on Runpods serverless. It works (sometimes) but I get a lot of random failures and Runpods does not return me the error logs. E.g. this is the response: ...

Same request running twice

Hi, My request finished a successful run and then the same worker received the same request again and ran it. How could I fix this issue?...

serverless workers idle but multiple requests still in the queue

I have set scaling for spinning a new worker when a request is in queue for 30 secs, but no new idle worker is running except for active workers despite having multiple requests in the queue for more than 90 secs

Question about serverless vllm endpoint

I would like to deploy Qwen2VL-2B using vllm serverless. I know that It will create an endpoint that I can use to send a prompt. But I wonder if I could also send an image with prompt?

Serverless pod tasks stay "IN_QUEUE" forever

I have a TTS model that I've deployed flawlessly as a Runpod Pod, and I want to convert it to a serverless endpoint to save costs. Did an initial attempt, but when I send a request to the deployed serverless endpoint, the task just stays as "queued" forever. Last line of my dockerfile is
CMD ["python", "-u", "runpod.py"]
CMD ["python", "-u", "runpod.py"]
...

not getting any serverless logs using runpod==1.6.2

i had this problem with runpod==1.7.x a week or two ago. was told to downgrade to 1.6.2, which worked. as of today logs have stopped appearing.

Add Docker credentials to Template (Python code)

I struggle to find how to add my docker credentials to the template (Python code) - I have the credentials added to the settings in docker, but I can't find how to add them to the template. Anyone know how to do that? template = runpod.create_template( name=deployment_name, **TEMPLATE_CONFIG...

Format of video input for vLLM model LLaVA-NeXT-Video-7B-hf

Dear Discord members, I have a question about using the vLLM template with the HuggingFace LLaVA-NeXT-Video-7B-hf model on text+video multi-modal input. Video input is a fairly new feature in the vLLM library and I do not seem to find definitive information on how I should encode the input video so that the running model instance decodes it into the format it understands. The online vLLM AI chatbot suggested a vector of JPEG-encoded video frames but that did not work. The vLLM GitHub gave me the impression that a NumPy array is the right solution but this does not work either....

How to view monthly bills for each serverless instance?

I am currently running multiple serverless instances at the same time, and I need to see how much each of my serverless instances costs in a month (or day, week) so that I can balance my priorities in the development process. I found the “Billing” section in RunPod, and scrolling down, there is a “Billing Explorer/Runpod Endpoints” section as shown in the picture, but it does not display anything (even though I have spent over 300 USD on RunPod in 2 months). May I ask why nothing is showing up, if I did something wrong, and if there’s any other way to check the bill for each serverless instance? Any answers would be greatly appreciated; please provide your information ❤️...
No description