Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Questions on large LLM hosting

1 I see mentions of keeping a model in a Network Volume to share between all endpoints. But if I already have my model inside of a container image-- wouldn't my model already be cached in that image? Which would be faster for cold boots? 2...
No description

Help with instant ID

Hello everyone, I am a Computer Engineering student with a basic coding understanding and very new to the AI space and need to do a project where i use something like InstantID to generate pictures from people. I think i can figure out how to get the model working using the example code but can't figure out how or where i can use custom models or LoRa's to get the style I want. Next to that i want it to be ran from an API of sorts because the final product wont have sufficient hardware. Does anyone know what service i can use for it that has clear instructions? I have looked into Replicate but there i dont seem to be able to add custom models at all so i think i need a different solution, this is where i stumbled upon runpod but i can't seem to figure out where to start....

serverless container disk storage size vs network volume

When I add a Serverless endpoint, it defaults to 5GB container disk. I tried to change it to a crazy high number like 50000GB, and it seems to be ok with it?? I'm confused, is this disk storage physically attached to the GPU machine? there a limit of this storage size? Does it cost any extra money? My ComfyUI docker image needs to download many different models, which can be hundreds of GB in total, what would happen if it exceeds the storage limit? If I choose to attach a network volume, does it mean my docker image (contains many different models) will be deploy to store in the network volume? And the deployed volume need to communicate with the GPU via network requests?? So it will be slower? ...
No description

Serverless Endpoint failing occasionally

i'm pretty new to runpod and started off with a serverless endpoint. when calling the api i sometimes get a failed response as return but can not really retrace whats wrong exactly.. calling the same API with the same input again works.. also the logs don't provide more information. How can i figure out, what is causing this error? Is there a best practise to catch the FAILED calls and analyze why they occur? Happy for any help!

Serverless can take several minutes to initualise...?

Should this be the case? An on-demand H100 is pointless to our business if I have to wait 30 seconds on average for it to start up and sometimes several minutes.

Maximum size of single output for streaming handlers

We are currently trying to refactor our RunPod handlers to work asynchronously with streaming of results. Unfortunately, we ran into this error when trying to yield result images:
2024-04-08T09:34:46.608281091Z {"requestId": "75dd8d62-adde-402a-902b-bbef06d90064-e1", "message": "Failed to return job results. | 400, message='Bad Request', url=URL('https://api.runpod.ai/v2/s6d4fprlj0v7k5/job-stream/9m2ossyhvxlp9a/75dd8d62-adde-402a-902b-bbef06d90064-e1?gpu=NVIDIA+L4&isStream=false')", "level": "ERROR"}
2024-04-08T09:34:46.608281091Z {"requestId": "75dd8d62-adde-402a-902b-bbef06d90064-e1", "message": "Failed to return job results. | 400, message='Bad Request', url=URL('https://api.runpod.ai/v2/s6d4fprlj0v7k5/job-stream/9m2ossyhvxlp9a/75dd8d62-adde-402a-902b-bbef06d90064-e1?gpu=NVIDIA+L4&isStream=false')", "level": "ERROR"}
...

401 Unauthorized

having used a combination of ashleyk's readme and generativelabs video, I have setup a runpod oobabooga worker. However I am unable to make calls to the endpoint resulting in a 401 unnauthorized: I have setup postman as the generativelabs video has suggested, and I have quadrouple checked that I have properly setup the api id & api key on postman and SAVED. I am also using the format as per @ashleyk's documentation (in fact i'm using the exact prompt), however I am still getting 401 unauthorized. As such i'm at a loss. Hope iI can get help...
Solution:
I ended up solving it by puting the API Key under Authorization as Type bearer instead of the header tab, and by manually putting the id into the URL
No description

Serverless suddenly stopped working

All my serverless machines disappeared, including my always-on machines. My endpoint is now stuck at initializing and my production service is down. This has been the case for the past 2 hours or so. Any idea what is happening? My endpoint is nrbt6cd41ed5he.

Balance Disappeared

Hi, I had an account on Run Pod a long while ago before all the UI changed. I had a Pod as well. I recently logged back into my account and all the UI was changed. More importantly my serverless Pod was gone as well as the $66 I had in my balance.

Having problems working with the `Llama-2-7b-chat-hf`

I have the following request going to the runsync endpoint. ``` { "input": { "prompt": "the context. Give me all the places and year numbers listed in the text above"...

Question about billing

My app is most of the time idle, from time to time i will come with a context and asl my llm model questions about the context. While my app is idle, do I pay anything for this idle time except disk space? Thanks....

2 active workers on serverless endpoint keep rebooting

We have 2 active workers on a serverless endpoint, sometimes the workers reboot at the same time for some reason, which causes major problems in our system.
2024-04-03T14:37:16Z create pod network 2024-04-03T14:37:16Z create container endpoint-image:1.2 2024-04-03T14:37:17Z start container...

Billing increases last two days heavily from delay time in RTX 4000 Ada

I checked my billing history and I saw that my serverless bills increased a lot and the culprit is the usage of RTX 4000 Ada. I checked the log and it's because of CUDA runtime error as ONNX runtime doesn't work with this specific GPU. This causes the container to retry again and again, increasing the delay time. I have never met this issue before so I'm not sure why it's happening now without any code change on my side. ...
No description

Error: CUDA error: CUDA-capable device(s) is/are busy or unavailable

I have 15 production endpoints deployed using Runpod and today they started to raise this error randomly. Do you know what is happening? I am worried about this because it generating a bad experience to the users of my product. Thanks

Auto-scaling issues with A1111

Hey, I'm running an A1111 worker (https://github.com/ashleykleynhans/runpod-worker-a1111) on Serverless but there is an issue with auto-scaling. The problem is that the newly added worker becomes available (green) before the A1111 has been booted. Because of this, new requests are being instantly sent to a new worker, and older workers are being shut down if they haven't received any requests during 5 seconds. This usually results in all active workers shutdown, and a long queue build up because all newly added workers haven't booted the A1111 yet. I tried to increase the idle timeout, e.g. to 180 seconds but in this case the workers never scale down....

How to make Supir in Serverless?

Please tell me how to create serverles with the supir project? Or perhaps someone can do this for money? https://github.com/chenxwh/SUPIR...
Solution:
for money should be dobable

Can we use serverless faster Whisper for local audio?

I deployed faster Whisper using serverless and invoked it using "import requests url = "https://api.runpod.ai/v2/faster-whisper/runsync" ...

is there any method to deploy bert architecture models serverlessly?

is there any method to deploy bert architecture models serverlessly?
Solution:
@Adam? https://www.runpod.io/console/explore then select this...
Message Not Public
Sign In & Join Server To View

NGC containers

Has anyone gotten NGC containers running on runpod? I see it as an option but I think it doesn't work because you need to install the ssh libraries on top. I need this to use FP8 on H100s since the PyTorch NGC container includes Transformer Engine for FP8. Building Transformer Engine manually takes a long time (requires downloading a cudnn tarball from NVIDIA website)....