Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

CPU Instances on 64 / 128 vCPUs FAIL

I can deploy my app on all instances except for 64 & 128 vCPU. Both of these run on AMD EPYC 9754 128-Core Processor. When it tries to run it gets stuck in QUEUE with the error (pasted below). When this happens it then just loops between "start container" and "failed to create shim task: the file python was not found: unknown". Any ideas what is causing this and how to resolve? There is similar issue reported in pods section here but I am using serverless and getting same problem. ERROR f...

JS pupeteer issue

Hi, I have a pupeteer chromium task that i want to run on the runpod which is written in typescript. I did not see any JS handler. So, i am calling the JS entrypoint via python as below....

Pytorch Lightening training DDP strategy crashed with no error caught on multi-GPU worker

It looks like serverless worker will crash when spawning new processes from the handler. It crashes after the first process is spawned "Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2". Same code works fine in multi-GPU pod web terminal.

Quick question about what "Extra workers" are

I hit a serverless endpoint a bunch of times in succession, and it spun up two other workers, which is GREAT! But they spun up the "Extra Workers" instead of the "Latest Workers". That prompted me to be confused what the "Extra Workers" are for even....
No description

Parallel processing images with different prompt

Hi! I am running a1111 on serverless. Is it possible to generate images in parallel with different prompt? As far as I know on sd web ui it's possible only to set the batch size, but it will use the same prompt and moreover it needs to have an external queue manager....

SDXL Serverless Worker: How to Cache LoRA models

In this code from github sdxl serverless worker repo, how does I cache LoRA models and get there path to use in my handler function? `# builder/model_fetcher.py import torch...

how to deploy custom image gen model on serverless?

https://blog.runpod.io/custom-models-with-serverless-and-automatic-stable-diffusion/ i read this but can i do it without the a1111? would it make it more simple to set up? i want to deploy a custom image model for image generation on serverless. i plan on just using the sdxl API....
Solution:
For A1111, this one is better, its more up-to-date: https://github.com/ashleykleynhans/runpod-worker-a1111...

Loading LLama 70b on vllm template serverless cant answer a simple question like "what is your name"

I am loading with 1 worker and 2 GPU's 80g But the model just cant performance at all, it gives gibrish answers for simple prompts like "what is your name"...

Now getting "cors" response from serverless? Never used to has there been a release?

Has there been a new release? Now getting 500 Status with type:cors in the response?

runpodctl project dev , auto intall all dependices on dev venv, but runpodctl project deploy not

Please see the screenshot, controlnet_aux is in the dev venv, but not in the deploy venv. Should I manually install the dependencies in the prod environment? I initially thought it would be automatically installed in the prod environment, although most of them were installed automatically."
No description

flashboot adding cost?

hello everyone! just got a question about flashboot does it add additional cost to serverless? I cant seem to find anything definative to answer this question?...
Solution:
No, there is no additional cost for flashboot

Workers deployed with wrong GPU

In 'worker configuration', I've selected '48 GB GPU' (A6000, A40). Upon executing an 'endpoints query' (from the documentation: https://docs.runpod.io/sdks/graphql/manage-endpoints "View your Endpoints") to view all of them, the corresponding endpoint ID shows RTX 4090 and A40 as the worker's GPUs. I tried using a POST request through CURL with the corresponding IDs (from the documentation: https://docs.runpod.io/sdks/python/apis "Get GPUs"), but the workers do not any GPUs assigned to them. The...
Solution:
Hi i tried the graphql and it works with this request: ```graphql mutation { saveEndpoint(input: { # options for gpuIds are "AMPERE_16,AMPERE_24,AMPERE_48,AMPERE_80,ADA_24"...
Message Not Public
Sign In & Join Server To View

More than 5 workers

Hello, I'm currently being limited by the 5 workers max limit. How can I increase that limit? ...
Solution:
press arrow icon

Money credited for unknown reason

Yesterday i recieved a message about low balance, then i credited 50$. but that's what i see on the next morning
No description

Serverless Instance stuck initializing

Hey I'm trying to spin up a serverless instance, I've had it working fine, but as soon as I try to attach a volume that I'm using as a cache, it is stuck initializing indefinitely. I have workers available, and nothing else has changed. Has anyone else had this issue?...
Solution:
got it working EU-RO-1 seems most suitable, and my instance is now initializing as expected. Thanks for the help @nerdylive legend!

Worker logs say this to all requests: API: unreachable.. retrying in 100ms

Logs of one of my workers say "API: unreachable.. retrying in 100ms" to all requests. See screenshot attached. Then Runpod's API replies the following: { "delayTime": 10776, "executionTime": 10206,...
No description

Workers configuration for Serverless vLLM endpoints: 1 hour lecture with 50 students

Hey there, I need to showcase 50 students how to do RAG with open-source LLMs (i.e., LLama3). Which type of configuration do you suggest? I wanna make sure they have a smooth experience. Thanks!
Solution:
16GB isn't enough, you need 24GB
No description

JS endpoint?

Hey, it isn't possible to run JS as serverless code?
Solution:
I think JS SDK not support serverless yet

All pods unavailable | help needed for future proof strategy

Region eu-se-1 has all pods unavailable for serverless. I need to protect against this because SLA - it's hard because I litteraly don't know how or where to read about it - on Monday a 1000-2000 usd a month need is expected so would love help. Maybe I am stupid, but I will have to look for alternatives I'm ofc stressed a bit. Hope you guys figure it out, and or can help me avoid and monitor this problem in the future....

Endpoint stuck in init

Hi! My serverless endpoint has been initializing for many hours now and I haven't changed anything! Been working for the past month. 🤔 Any ideas?...
No description