RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Some serverless requests are Hanging forever

I'm not sure why but I "often" (often enough) have jobs that just ... hang there even if multiple gpus are available on my serverless endpoint. new jobs might come it and go through while the old job just "stalls" there. any idea why ?...

Job retry after successful run

My endpoint started to have retries for every request even though the first run is successful without any errors. Don't understand why that is happening. That is what I see in the logs when first run finishes, and retry starts 2024-10-10T11:51:52.937738320Z {"requestId": null, "message": "Jobs in queue: 1", "level": "INFO"}...

Why too long delay time even if I have active worker ?

I have set the active worker to 1. I am manually testing the response delay. I submit the next task only after the previous task is completed, so there is no waiting time. However, many times, the delay time is still very long, sometimes even reaching more than 4 seconds. Why is this? In my code, the model has been loaded before runpod.serverless.start({"handler": run})...
No description

Keeping Flashboot active?

It is my understanding that Flashboot is only active for "a while" after each request, and then it is disabled as the instance goes to a deeper sleep. Sadly for me it takes a whopping 70-90 seconds of just delay to cold start after a long delay (running llama-2-13b-chat-hf off the 48GB GPUs e.g. A40), I don't know if I am doing something wrong there as I see others on this forum are getting much much faster start times. However, on consecutive jobs, the delay drops down to 1-3 seconds. What is t...

Hugging face token not working

Hello! Has anyone had issues getting their hugging face token to work on a serverless vLLM instance? I have used hugging face before and their tokens work for me locally, but I keep getting access denied log entries on the console logs when trying to send a request even though I give it the token key...

Pod stuck when starting container

Yesterday I updated my serverless endpoint with "New release" button. However when the new request came the worker stuck when trying to start container and sucked the remaining funds from my account. In the logs I see multiple worker exited with exit code 0 errors Probably something wrong with my container, but would be nice if after multiple failed attempts to start container the worker stopped automatically and didn't drain money....

Local Testing: 405 Error When Fetching From Frontend

Hi, I am trying to test my handler function by fetching data with my frontend (running on localhost:3000). I am running the local RunPod test server (FastAPI) locally and am trying to make requests to it. However, I keep running into a 405 error. My curl requests are working great; however, I need to test my backend from the frontend. I can't find documentation that demonstrates how I can allow requests from localhost:3000 -- normally I would just add a relaxed CORS policy, but I am not sure how to do that with RunPod. I have tried running quite a few different fetch requests, including with my API key, but nothing is working. For reference, here is what I am currently doing on my Next.js frontend: const header_data = { input: { subjob: "root",...

Automatic1111 upscaling through API

I have an Automatic1111 endpoint, and I am trying to run the following request using the SD upscale script: `{ "input": { "sd_model_checkpoint": "", "sd_vae": "",...

Can we run Node.js on a Serverless Worker?

According to the Serverless Overview doc page (https://docs.runpod.io/serverless/workers/overview), we can write functions in the language you're most comfortable with. There's a Runpod SDK on NPM (https://www.npmjs.com/package/runpod-sdk), but that looks like it's meant to call existing endpoints, not to create handler functions. Is this possible? If so, are there any templates available to create the handler function in Node.js?...

Microsoft Florence-2 model in serverless container doesn't work

I'm trying to use Florence 2 models in ComfyUI workflow with serverless container and it returns with error: raise RuntimeError(f'{node_type}: {exception_message}')\nRuntimeError: DownloadAndLoadFlorence2Model: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate Accelerate library already installed in venv in network storage where comfyui runs, also I installed it in docker container. Maybe anyone know how to solve this problem? Thanks in advance...

Terrible performance - vLLM serverless for MIstral 7B

Hello, When I serve Mistral-7B quantized in AWQ using a model such as "TheBloke/Mistral-7B-v0.1-AWQ" in the vLLM serverless instance of runpod, I get terrible performance (accuracy) compared to running Mistral 7B on my CPU using ollama (which uses GGUF quantization and Q4_0), could this be due to a misconfiguration by me in the parameters, although I kept the defaults, or is AWQ quantization known to drop the performance that low? Thank you...

New release on frontend changes ALL endpoints

When I publish a new release to one endpoint through the runpod website, I noticed that it will push the release to all other endpoints as well. This has messed up a few of my workflows.

Endpoints vs. Docker Images vs. Repos

Hi, I am new to both Docker and RunPod, so my apologies if this question is overly obvious. I am trying to convert a FastAPI app into a RunPod serverless endpoint. My question is, given that my FastAPI app has many endpoints, how can I access all those endpoints from just one RunPod serverless endpoint? Does it make more sense to create a serverless endpoint for every RESTful endpoint in my FastAPI app? Would I then need to create a different docker image for each endpoint? I've spent a good amount of time looking through the docs, and most of the examples seem to use only one endpoint. Any resources you could point me to would be greatly appreciated. Thanks for your help!...

Serverless Streaming Documentation

I'm using the runpod github template for my model and it's working- but how would I set it up and make my model stream to runpod?

Serverless or Regular Pod? (How good is Flashboot?)

Hello, I’m a new user of RunPod, and I’m using it for image generation AI. I’m planning to create an API based on the ComfyUI workflow I’ve developed, so people can enter prompts on my website and receive the generated images. However, I’m not sure whether I should use Serverless or just keep a regular Pod running 24/7 and manually create the API there. ...

Errors in container

I am receiving some weird container errors on some serverless endpoints I have. An example is below. They seem to have fixed themselves by me terminating the worker but didn't know if there were any issues you were aware of? Error response from daemon: Container XYZ124453 is not paused

🚨 HELP! Struggling with Super Slow Docker Pulls from Azure ACR on Runpod 🚨

Hey everyone! 😩 Is anyone else experiencing snail-paced download speeds when pulling Docker images from Azure ACR on Runpod? 🐢 It's taking ages, and it's seriously slowing down my workflow. Any tips, tricks, or solutions would be massively appreciated! 🙏 Details: Docker Repo: Hosted on Azure ACR...

Reporting/blacklisting poorly performing workers

I've noticed that every now and then a bad worker is spawned for my endpoint which takes forever to complete the job when compared to other workers running the same job. Typically my job would take ~40s but there are occassionally workers that have the same gpu but take 70s instead. I want to blacklist these pods from running my endpoint so performance isnt impacted