Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

11/7/2024

Same request running twice

Hi, My request finished a successful run and then the same worker received the same request again and ran it. How could I fix this issue?...

logs_run_twice.txt

bradfox2

11/6/2024

Why is 125M from facebook loading into VLLM quickdeploy even though another model is specified?

Specified a qwen variant - get facebook opt125m deployed instead.

aksay_23298

11/5/2024

serverless workers idle but multiple requests still in the queue

I have set scaling for spinning a new worker when a request is in queue for 30 secs, but no new idle worker is running except for active workers despite having multiple requests in the queue for more than 90 secs

11/4/2024

Question about serverless vllm endpoint

I would like to deploy Qwen2VL-2B using vllm serverless. I know that It will create an endpoint that I can use to send a prompt. But I wonder if I could also send an image with prompt?

jhappy

11/4/2024

Serverless pod tasks stay "IN_QUEUE" forever

I have a TTS model that I've deployed flawlessly as a Runpod Pod, and I want to convert it to a serverless endpoint to save costs. Did an initial attempt, but when I send a request to the deployed serverless endpoint, the task just stays as "queued" forever. Last line of my dockerfile is

CMD ["python", "-u", "runpod.py"]

CMD ["python", "-u", "runpod.py"]

...

zfmoodydub

11/4/2024

not getting any serverless logs using runpod==1.6.2

i had this problem with runpod==1.7.x a week or two ago. was told to downgrade to 1.6.2, which worked. as of today logs have stopped appearing.

Dealsourcr Support Team

11/4/2024

Add Docker credentials to Template (Python code)

I struggle to find how to add my docker credentials to the template (Python code) - I have the credentials added to the settings in docker, but I can't find how to add them to the template. Anyone know how to do that? template = runpod.create_template( name=deployment_name, **TEMPLATE_CONFIG...

fcsikor

11/3/2024

Format of video input for vLLM model LLaVA-NeXT-Video-7B-hf

Dear Discord members, I have a question about using the vLLM template with the HuggingFace LLaVA-NeXT-Video-7B-hf model on text+video multi-modal input. Video input is a fairly new feature in the vLLM library and I do not seem to find definitive information on how I should encode the input video so that the running model instance decodes it into the format it understands. The online vLLM AI chatbot suggested a vector of JPEG-encoded video frames but that did not work. The vLLM GitHub gave me the impression that a NumPy array is the right solution but this does not work either....

CongDC

11/3/2024

How to view monthly bills for each serverless instance?

I am currently running multiple serverless instances at the same time, and I need to see how much each of my serverless instances costs in a month (or day, week) so that I can balance my priorities in the development process. I found the “Billing” section in RunPod, and scrolling down, there is a “Billing Explorer/Runpod Endpoints” section as shown in the picture, but it does not display anything (even though I have spent over 300 USD on RunPod in 2 months). May I ask why nothing is showing up, if I did something wrong, and if there’s any other way to check the bill for each serverless instance? Any answers would be greatly appreciated; please provide your information ❤️...

Liringlas

11/3/2024

Issue with KoboldCPP - official template

I tried with two models (103b Midnight Miqu v1.0 and 123b Behemoth v1.1) in Q4 GGUF on a pod with the https://www.runpod.io/console/explore/2peen7lpau template. In both cases the models download successfully (2 files in both cases) When launching Kobold CPP the following error: Something possibly went wrong, stalling for 3 minutes before exiting so you can check for errors. ...

logs.txt

Aruj

11/2/2024

How to give docker run args like --ipc=host in serverless endpoints

yccheok

11/1/2024

Is Runpod's Faster Whisper Set Up Correctly for CPU/GPU Use?

Hi, I'm currently using Faster Whisper provided by Runpod. https://github.com/runpod-workers/worker-faster_whisper While reviewing the code, I found something confusing:...

Anta

11/1/2024

Endpoint initializing for eternity (docker 45 Gb)

Hi! My docker image is about 45 Gb and I it has been about 20 hours since it started downloading it. https://www.runpod.io/console/serverless/user/endpoint/wzkav7ouarzdxv + there are 6 like this endpoints at the same time, all of them downloading Our gitlab docker registry has huge network output speed, I suppose it should not be botlnecked by it...

overdosed

11/1/2024

request cannot running,infinite delay

Ergin Bilgin

11/1/2024

Llama-3.1-Nemotron-70B-Instruct in Serverless

Hello there, I've been trying to deploy Nvidia's Llama-3.1-Nemotron-70B-Instruct in serverless using vLLM template but I could not get it work no matter what. I'm trying to deploy it using an endpoint using 2 x H100 GPUs, but in my most attempts I don't even see weights being downloaded. Requests start and after few minutes worker terminates....

prongs

10/30/2024

Job delay

Hello, I have been seeing an increase in delay time while workers boot up, even while using Flashboot. I am using 1.5.3 which seems to have improved it a bit but not significantly. Having said that, is there an API that can be called to boot up the worker if there is an incoming request in let's say 5 seconds. This would ensure that the worker is warm and ready when it arrives. Does idle timeout perform a similar function?

dhe128

10/30/2024

How to get `/stream` serverless endpoint to "stream"?

Example from official documentation: https://docs.runpod.io/sdks/javascript/endpoints#stream ``` from time import sleep import runpod...

spooky

10/30/2024

jobs queued for minuets despite lots of available idle worker

for the past couple of days my jobs keep getting queued for a long time despite lots of available "idle" workers - no where near my max workers. sometimes there's 9 available workers but concurrent jobs still get queued... anyone have any insight on this?

latentspace

10/30/2024

Request stuck because of exponential backoff, what does it mean?

relevant log line: "message":"b2http.py :600 2024-10-29 15:56:21,141 Pausing thread for 64 seconds because that is what the default exponential backoff is\n" this keeps repeating in serverless GPU logs and pod is on and request is stuck...

latentspace

10/30/2024

in serverless CPU, after upgrading to runpod sdk 1.7.4, getting lots of "kill worker" error.

This is severless CPU workers, not GPU. Initially serverless CPU was on 1.7.3 and was timing out if execution time was longer than a minute. So, I downgraded to 1.6.2 and it worked fine. Yesterday, I upgraded to 1.7.4 and getting "kill worker" error

Previous Next

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!