Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

prod

Hi guys we are having a problem with our serverless enviornment - what is the best way to get this resolved?
No description

Runpod serverless overhead/slow

I have a handler that is apparently running very fast, but my requests are not. I'm hoping to process video frames. I know this is an unconventional use case, but it appears that it is working reasonably well with this one exception: ``` 2024-08-11T23:25:55.468886839Z INFO: 127.0.0.1:40312 - "POST /process_frame HTTP/1.1" 200 OK 2024-08-11T23:25:55.469954721Z handler.py :61 2024-08-11 23:25:55,469 Local server request completed in 0.079 seconds...
No description

Getting an error with workers on serverless

Running a docker image for comfyui and using face reactor : 2024-08-11T20:59:36.013496862Z File "/comfyui/custom_nodes/comfyui-reactor-node/scripts/reactor_faceswap.py", line 91, in process 2024-08-11T20:59:36.013503712Z result = swap_face( 2024-08-11T20:59:36.013518132Z File "/comfyui/custom_nodes/comfyui-reactor-node/scripts/reactor_swapper.py", line 230, in swap_face 2024-08-11T20:59:36.013529842Z source_faces = analyze_faces(source_img)...

serverless delay time cost

does the delay time get charged or it only the execution time is charged?

Deploying bloom on runpod serverless vllm using openai compatibility, issue with CUDA?

Hey guys, I'm getting this error and I really need help: InternalServerError Traceback (most recent call last) Cell In[42], line 4 1 def get_translation(claim, model=model):...

Confusion with IDLE time

I have a serverless endpoint deployed with an idle timeout of 5 seconds. I expect that after 5 seconds, if I send a new request, the Docker image is downloaded again. Instead, the idle timeout is much longer. Even after minutes pass following sending a request again, the Docker image is already loaded, resulting in a very quick response which is good but i dont understand.

Does Runpod have an alternative to Ashley Kleynhans' github repository for creating a1111 worker?

3 days ago, I created a serverless instance by using https://github.com/ashleykleynhans/runpod-worker-a1111. It's not accessible anymore. The https://github.com/runpod-workers/worker-a1111 doesn't seem to have the same functionality. I would want to choose my own endpoint, loras, Ad detailer and an upscaler. What solution do we have now to replace Ashley's work? Does somebody as forked his repo?...

Slow network volume

Some people reported, that loading models from network-volumes is very slow compared to baking the model into the image itself.

Sticky sessions (?) for cache reuse

In my case—building an AI chat application (duh)—it'd be useful to be able to direct a succeeding request to the same node of an ever-scaling endpoint for efficient KV cache reusing. Is that currently possible with Rundpod? Because I as see now, there is no way to force a specific node when making request to a endpoint. The question applies both to the vLLM endpoint template & custom handlers.

Timeout Error even if higher timeout it set

I set timeout to 1200 second altough when running the request i get { "delayTime": 36534, "error": "ReadTimeout: HTTPConnectionPool(host='127.0.0.1', port=5000): Read timed out. (read timeout=600)", "executionTime": 601182, "id": "a2bf98f6-2201-44ea-9710-d588521fda45-e1",...

async execution failed to run

Hello everyone, I'm trying to implement a discord bot that can send request to serverless endpoint and run ComfyUI to generate images. I tried to refer the code in the documentation but the worker never able to run through the job, instead it failed to return anything. By changing the code to run_sync, this worked perfectly fine. I have attached my handler and get image function below....
No description

whisper

I built docker using official github runpod and i get 024-08-08T20:20:40.508212050Z {"requestId": null, "message": "{\n "error_type": "<class 'RuntimeError'>",\n "error_message": "Library libcublas.so.12 is not found or cannot be loaded",\n "error_traceback": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\", line 134, in run_job\n handler_return = handler(job)\n File \"/usr/local/lib/pytho...

Can't run a 70B Llama 3.1 model on 2 A100 80 gb GPUs.

Hey, so I tired running the 70B llama model on 2gpu/worker but it keeps getting stuck at the same place every time but instead if I switch to the 8B model on 1 gpu/worker with a 48gb GPU, it works easily. The issue is coming with the 70B paramater model on 2 gpus/worker.

can't run 70b

any tips to run a 70b model, for example: mlabonne/Llama-3.1-70B-Instruct-lorablated i tried that: config 80GB GPU...

Error getting response from a serverless deployment

I tried to create multiple serverless vLLM deployments and even picked the top end GPU. However the requests would always go to a in-progress status and would not respond. I'm building a chatapp and such a slow response isn't acceptable. Is there something else I should do? I had selected all default option for the google/gemma-2b model while creating the deployment. I know the requests from my app hit runpod as I could see the requests and their status but it would never respond back. I was try...

Copy Network volume contents to another.

What is the way to copy one network volume content to another network volume.?

Charged while not using service

Spun up a serverless api. Did not use it at all. Got billed 60$ since last night. Could you check what caused this behavior. EnerpriseDna Team...

"IN QUEUE" and nothing happeneds

Hello everyone, I'm currently running a TGI container (ghcr.io/huggingface/text-generation-inference:2.2.0) within a serverless environment, alongside my model from Hugging Face. Issue: Although the status indicates "connected," there seems to be no further activity. The logs display various INFO and WARNING messages but do not show any errors. This has left me puzzled as to the root cause of the problem....
No description

How can I cause models to download on initialization?

``` FROM runpod/pytorch:2.2.1-py3.10-cuda12.1.1-devel-ubuntu22.04 WORKDIR /content...

Optimizing Docker Image Loading Times on RunPod Serverless – Persistent Storage Options?

I'm working with a large Docker image on RunPod Serverless, containing several trained models. While I've already optimized the image size, the initial docker pull during job startup remains a bottleneck as it takes too long time to complete. Is there a way to leverage persistent storage on RunPod to cache my Docker image? Ideally, I'd like to avoid the docker pull step altogether and have the image instantly available for faster job execution. Thanks,...