Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Rag on serverless LLM

I am running a server less LLM. I want to add to a model a series of pdf files to augment the model. I can do it on webui in a dedicated gpu by adding knowledge

Unexpected Infinite Retries Causing Unintended Charges

I recently ran my serverless workload using my custom Docker image on RunPod, and I encountered an issue that resulted in significant unexpected charges. My application experienced failures, and instead of stopping or handling errors appropriately, it kept retrying indefinitely. This resulted in: - $166.69 charged by OpenAI due to repeated API calls. - $14.27 charged on RunPod for compute usage....
No description

Serverless vLLM workers crash

Whenever I create a serverless vLLM (doesn't matter what model I use), the workers all end up crashing and having the status "unhealthy". I went on the vLLM supported models website and I use only models that are supported. The last time I ran a serverless vLLM, I used meta-llama/Llama-3.1-70B, and used a proper huggingface token that allows access to the model. The result of trying to run the default "Hello World" prompt on this serverless vLLM is in the attached images. A worker has the status...
No description

Meaning of -u1 -u2 at the end of request id?

Would like to have what those means. I saw u2 on and u1 both sync and not sync requests, couldn't understand what is that.

Ambiguity of handling runsync cancel from python handler side

Hi. What's the best way I can handle "cancel" signal in serverless server/handler side? Is default cancel logic just stopping the container all together?

Enabling CLI_ARGS=--trust-remote-code

I am trying to run some of the SOTA models and the error logs tell me that I need to enable this CLI flag. How can I do that?

CUDA profiling

Hey guys, how can I profile kernels on serverless GPUs Like I have a cuda kernal, how can I know it’s performance using serverless GPUs like RunPod gpus...

Serverless handler on Nodejs

Hi. I see there is official SDK for serverless handler, but for Python. I don't see any API for handler in js-sdk.

RunPod Serverless Inter-Service Communication: Gateway Authentication Issues

I'm developing an application with two RunPod serverless endpoints that need to communicate with each other: Service A: A Node.js/Express API that receives requests and dispatches processing tasks Service B: A Python processor that handles data and needs to notify Service A when complete ...

Runpod ComfyUI Serverless Huggingface Models does nothing

When deploying a ComfyUI serverless endpoint, the attached screen appears which asks for Hugging Face Models. However when I checked the repo, it is not utilized at all. https://github.com/search?q=repo%3Arunpod-workers%2Frunpod-worker-comfy%20MODEL_NAME&type=code How do I download required models (.safetensors) and comfy nodes when deploying an endpoint?...
Solution:
when you press next, until there is environment variable, you can check what is added there. then you can do add same env's with the same docker image template
Message Not Public
Sign In & Join Server To View
No description

Serverless ComfyUI -> "error": "Error queuing workflow: HTTP Error 400: Bad Request",

I am running Serverless ComfyUI wirh Runpod and it is not working can someone please help ? i keep getting Job response: { "delayTime": 1009, "error": "Error queuing workflow: HTTP Error 400: Bad Request",...

Error 404 on payload download.

Hi guys! I'm tryin to download a file to my endpoint for processing, using the runpod download utility, and sometimes but no always I get the message: ```...

Failed Faster-Whisper task

I continue to get this error and I cant figure out whats going on, please help ❤️ Job submitted: 3f6a6e02-5249-4faf-9fb3-49ac501c695d-u2 Job failed: {'delayTime': 163, 'error': '{"error_type": "<class 'av.error.InvalidDataError'>", "error_message": "[Errno 1094995529] Invalid data found when processing input: '/tmp/tmpi89o0mcn.wav'", "error_traceback": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\", line 134, in run_job\n handler_return = handler(job)\n File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/utils/rp_debugger.py\", line 165, in call\n result = self.function(*args, **kwargs)\n File \"/rp_handler.py\", line 72, in run_whisper_job\n whisper_results = MODEL.predict(\n File \"/predict.py\", line 75, in predict\n segments, info = list(model.transcribe(str(audio),\n File \"/usr/local/lib/python3.10/dist-packages/faster_whisper/transcribe.py\", line 277, in transcribe\n audio = decode_audio(audio, sampling_rate=sampling_rate)\n File \"/usr/local/lib/python3.10/dist-packages/faster_whisper/audio.py\", line 46, in decode_audio\n with av.open(input_file, metadata_errors=\"ignore\") as container:\n File \"av/container/core.pyx\", line 401, in av.container.core.open\n File \"av/container/core.pyx\", line 272, in av.container.core.Container.cinit\n File \"av/container/core.pyx\", line 292, in av.container.core.Container.err_check\n File \"av/error.pyx\", line 336, in av.error.err_check\nav.error.InvalidDataError: [Errno 1094995529] Invalid data found when processing input: '/tmp/tmpi89o0mcn.wav'\n", "hostname": "187psw4ygtfrrm-64410c48", "worker_id": "187psw4ygtfrrm", "runpod_version": "1.5.2"}', 'executionTime': 184, 'id': '3f6a6e02-5249-4faf-9fb3-49ac501c695d-u2', 'status': 'FAILED', 'workerId': '187psw4ygtfrrm'}...

Delete Serverless Endpoint via the API?

I am trying to delete the serverless endpoint via the API, but everytime I make a request to the endpoint, I get an internal error: Via the Python API: ``` delete_endpoint_graphql = """mutation {{...

Terminate worker

Hi y'all, is there any way to terminate specific worker (serverless) via api or as additional control in handler return? I do not want to refresh the worker i just want to terminate at my special occasions....

Is it possible to response with Transfer-Encoding: Chunked

Hello, I'm using serverless endpoints. Currently return a JSON object. Is it possible to lets say directly return a wav file with Transfer-Encoding chunked So the response headers would be Content-Type: audio/wav...

disk quota exceeded serverless runpod github

Hi, I'm getting a disk quota exceeded when trying to build my runpod serverless from a github repo. It downloads a few models. Is there a maximum quota size ? ...
Solution:
okay, the build has timeouts too, so try to optimize for that
Message Not Public
Sign In & Join Server To View

Ollama serverless?

is thaty any easy way to run ollama over serverless?

Serverless docker image deployment

Hi, I finetuned a lora from llama 3.2 3B using unsloth. and want to deploy that on serverless. Using vLLM with merged model degrades the performance too much to be of use. I then, followed instructions from this link https://github.com/runpod-workers/worker-template/tree/main and created a serverless endpoint using the docker image. but it keeps on initializing and does not complete one job. job remains in queue. I might be missing something. I also don't have much experience with docker. I might be making a mistake there. But I did test the docker locally before deploying. I would appreciate any help regarding this....

Can you now run gemma 3 in the vllm container?

In the serverless, its seems im getting an error, any help on this