Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Serverless After Docker Image Pull, Error (failed to register layer: Container ID 197609 cannot...)

detail system logs ```plaintext 2024-03-13T07:42:28Z ffd6a93e8d50 Extracting [====================================> ] 122MB/165.8MB 2024-03-13T07:42:28Z ffd6a93e8d50 Extracting [=====================================> ] 125.9MB/165.8MB...

Failed Serverless Jobs drain Complete Balance

Hi, just like this GitHub issue (https://github.com/runpod-workers/worker-vllm/issues/29), I had my balance drained completely multiple times due to serverless jobs stuck and automatically restarting. Jobs can fail for many different reasons, so testing them thoroughly is very hard without a higher load. A month ago it was announced that a feature to solve the issue would be introduced (see image). However, I could not find any configuration to limit the number of retries for a failed serverless inference in the UI, only a configuration to enable the Execution Timeout. Therefore two questions: 1. Is the feature to automatically kill jobs after n failed execution attempts already introduced but not configurable by the user? If so, what is the limit? 2. Is the total execution timeout (configurable per endpoint or per request via API) is counted per job execution or per job? E.g. would a limit of 100 seconds be only reached if the job ran for 100 seconds without interruption or would it also be reached if the job ran a first time, failed after 60 seconds, and ran the second time without failure for (more than) 40 seconds?...

Serverless multi gpu

I have a model deployed on 2 48 GB GPUs and 1 worker. It ran correctly for the first time with cuda distributed. But then fails with this "error_message": "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)",\n "error_traceback": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\\ What can be the issue here?...

How can i make a follow up question to the endpoint

How can i make a follow up question to the endpoint like a thread. Eg: Chat in chatGPT

Illegal Construction

When building a mock serverless endpoint, to test locally against test_input.json, i am not recieving the --- Starting Serverless Worker | Version 1.6.2 --- log in my container upon run....

Serverless cost

i want to deploy my model on serverless, how it is costing?

What is the difference between setting execution timeout on an endpoint and setting in the request?

What is the difference between the screenshot and: ```json { policy": { "executionTimeout": 300000...
Solution:
Oh never mind, the answer is in the docs.
No description

Serverless custom routes

Hi there. I'd like to implement by own streaming custom routes like the vllm worker ( https://github.com/runpod-workers/worker-vllm ). This worker supports routes like, https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1. How is this done? When I look in the source code that worker gets special keys passed to it in the rp handler like job_input.openai_route. Where does this key come from?
Thanks. Jon....

What is N95 in serverless metrics?

I finally understand the percentiles (P70, P90, P98) for serverless metrics, but I don't understand what N95 is. Can someone please explain what it is, how its calculated and what the significance of it is?
Solution:
from what i can see yes

venv isolation in network volume

For venv isolation in my project, which consists of multiple services, with each service corresponding to a serverless worker, and each worker being built using a different subproject, should the venv be deployed within the subproject to prevent dependency conflicts between different subprojects?

serverless multi-gpu

Hi, in the serverless endpoint console I'm seeing that you can't have a serverless multi-gpu endpoint except for 2x A40? Is this correct? So essentially the serverless product is only for smaller models?
Solution:
we are slowly allowing more as we get more available capacity

Serverless API Question

Hi, I am currently using this guide https://doc.runpod.io/reference/runpod-apis, and attempting to retrieve my results with the status request. However, the response that I get just looks like { "delayTime": 66679, "executionTime": 41266, "id": "5dbd4fb0-b6f9-44d9-a242-820d9ddbc929-u1",...

serverless endpoint, Jobs always 1 in queued, even 3 workers running

after 600 s ,still 1 jos in queued,and log nothing. How i to see what is running? This morning when I was using a GPU pod, I was prompted that an ip_adapter was not found, but now I can't see any output. My local project does have an ip_adapter.
No description

Serverless Inference

Hi, I have been using runpod to train my model, and am very interested in using serverless computing to deploy it. I have successfully created a docker image that loads the model and contains an inference endpoint function. However, the model is rather large, and I am curious if there is a way to hold the model in ram to avoid loading it every time the container is stopped and restarted? If not, could anyone recommend another resource for model deployment? Is a traditional server a better optio...

Serverless can't connect to s3

Hey guys! I'm currently trying to get my serverless workers to retrieve a video from an Amazon s3 bucket, however, I am unable to gain access despite providing the aws key and secret code as mentioned in the documentation. If I make my s3 bucket public, it works, so I believe it's an issue with the authentication. I'm using the Faster-Whisper template. I've just thrown this into my code and I think it doesn't actually do anything. ...
Solution:
cant you just generate presigned URL from your S3 bucket?

how to signup for dev.runpod.io?

i keep getting not allowed to access error
Solution:
As name say this page is for development not for costumer.
No description

Worker configuration for serverless

Hello, When I edit my endpoint, I choose a configuration and there are numbers displayed. What is it, is it a priority? Like try first with config 1, if error or unavailable then config 2 etc ? Also, it seems when I edit it doesn't take the modification into account. I unselected a GPU but it's always running with this, so I was wondering if the edit is broken and I should create a new endpoint?...
No description

connection closed by remote host

Connection to 69.30.85.26 closed by remote host. rsync: [sender] write error: Broken pipe (32) rsync error: unexplained error (code 255) at io.c(848) [sender=3.2.7] Activating Python virtual environment /7bb8882d/venv on Pod 9h5hmefddu8msr Creating Project watcher......

When using runpodctl project dev to upload a project, is there a speed limit?

From my tests, the speed seems to be around 1.2MB per second. When uploading for the first time, the various large models combined exceed 10GB, which takes a considerable amount of time to complete.

Request Stuck in Queue

Havent seen a single log, dont know where to start debugging
No description