RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join Server
Sssssteven1/10/2024

RuntimeError: The NVIDIA driver on your system is too old (found version 11080). Please update your

I deploy a new version today but keep running into this error. Did something changed on RunPod? Thanks!
RReason1/10/2024

Worker log says remove container, remove network?

Not even sure this is an issue, but one of my endpoints I'm testing has a throttled worker that has an odd output in their log. I'm not sure if it's crashed and been removed or just deallocated or something? ``` 2024-01-10T14:00:00Z create pod network 2024-01-10T14:00:00Z create container ghcr.io/bartlettd/worker-vllm:main ...
Solution:
thats normal, unless worker is running
DDerwe1/10/2024

Hi all. I created a pod, started it, but can't ssh, can't start its "web terminal", can't do anythin

I've created a new pod, started it, added the RSA keys, etc… however, can't ssh; Error response from daemon: Container f3aeaa504300180e74107f909c00ece20c4e18925c55c45793c83c9d3dc52852 is not running Connection to 100.65.13.88 closed. Connection to ssh.runpod.io closed....
RReason1/10/2024

Should I be getting billed during initialization?

Trying to understand exactly how serverless billing works with respect to workers initialising. From the GUI, behaviour is inconsistent and I can't find an explanation in the docs. I have an example where workers are pulling a docker image, one of the workers says they're ready despite still pulling the Image while the other two are in the initialising state. The indicator in the bottom right shows the per second pricing for one worker which would make sense if its active, but it clearly isn't ready to accept jobs. Also, pulling images from Github container registry takes an absolute age, I'd be disappointed about getting charged more because of network congestion....
Solution:
we have seen this happen if you update your container using same tag
No description
Ffoxhound1/9/2024

[RUNPOD] Minimize Worker Load Time (Serverless)

Hey fellow developers, I'm currently facing a challenge with worker load time in my setup. I'm using a network volume for models, which is working well. However, I'm struggling with Dockerfile re-installing Python dependencies, taking around 70 seconds. API request handling is smooth, clocking in at 15 seconds, but if the worker goes inactive, the 70-second wait for the next request is a bottleneck. Any suggestions on optimizing this process? Can I use a network volume for Python dependencies like I do for models, or are there any creative solutions out there? Sadly, no budget for an active worker....
No description
CConcept1/9/2024

Runpod VLLM Context Window

Hi I've been using this template in my serverless endpoint https://github.com/runpod-workers/worker-vllm I'm wondering what my context window is/how its handling chat history? ...
Aardra1/9/2024

Real time transcription using Serverless

creation of handler file for real time transcription app
JJack1/9/2024

ailed to load library libonnxruntime_providers_cuda.so

Here is the full error: [E:onnxruntime:Default, provider_bridge_ort.cc:1480 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcufft.so.10: cannot open shared object file: No such file or directory I am running AUTOMATIC1111 on Serverless Endpoints using a Network Volume. I am using the faceswaplab extension. In this extension, there is the option to use GPU (by default, the extension only uses CPU). When I turn on the Use GPU option, I get the error....
Wwizardjoe1/9/2024

Setting up MODEL_BASE_PATH when building worker-vllm image

I'm a little confused about this parameter in setting up worker-vllm. It seems to default to /runpod-volume, which to me implies a network volume, instead of getting baked into the image, but I'm not sure. A few questions: 1) If set to "/runpod-volume", does this mean that the model will be downloaded to that path automatically, and therefore won't be a part of the image (resulting in a much smaller image)? 2) Will I therefore need to set up a network volume when creating the endpoint? 3) Does the model get downloaded every time workers are created from a cold start? If not, then will I need to "run" a worker for a given amount of time at first to download the model?...
Sssssteven1/7/2024

What does the delay time and execution mean in the request page?

Hey all, I'm not sure what the delay time mean in the Requests page. Is it about the cold start? Could someone help me understand it? Also, the execution time seems to be way larger than the duration I've logged. Is the execution time means the excution time of the handler function? Thanks!
Wwmute1/7/2024

Extremely slow Delay Time

We are using 2 serverless endpoints on runpod and the "Delay Time" (which I assume measures end to end time) varies drastically between the endpoints. They both use the same hardware (the A5000 option) and one of them has sub-second delay times and the other ~50 seconds up to 180s. On the slow endpoint, the worst cold start time is reported as 13s, and the execution time is ~2s, which don't add up to the delay time. There are ~50 seconds unnacounted for. The other endpoint using the same hardware does not observe such drastic delay time....
Wwmute1/7/2024

Custom template: update environment variables?

I have configured environment variables in my custom endpoint template. When I edit the template to change their contents, the workers still seem to be using the old ENV values. What ultimately works is removing and recreating the whole endpoint, but I don't want to do that repeatedly. I've tried triggering a refresh using the "Create new release" functionality but it didn't seem to help. What is the recommended way of making sure that the workers are using the latest environment variables from the template?...
CCasper.1/7/2024

Delay on startup: How long for low usage?

I am trying to gauge the actual cold start for a 7B LLM deployed with vLLM. My ideal configuration is something like this: 0 active workers, 5 requests/hour, and up to between 100-200 seconds of generation time. How long would it take for RunPod to do a cold start with delay time and everything? Essentially, what is the min, avg, max in terms of time to first token generated?...
Kkecikeci1/7/2024

Why not push results to my webhook??

Why not push results to my webhook
Llucasavila001/7/2024

Restarting without error message

I'm deploying some code to serverless and it seems the code crashes and restarts the process, without an error message. In the logs it just shows that it has restarted, I can tell by my own startup logging. In the end I could make it work by using an specific version of CUDA and an specific version of a dependency, but I would like to know why it crashes, to fix it. Everything works fine locally with nvidia-docker......
Sssssteven1/7/2024

Set timeout on each job

Hello, is there anyway to set a hard limit timeout for each job? Thank you!
ICIan Chen1/6/2024

issues using serverless with webhook to AWS API Gateway

For some reason my api gateway does not receive any of request from runpod. My API gateway does not require any authorization so I can’t think of why it does not go through. My runpod endpoint id is duy9bf9dm50ag7...
Sssssteven1/6/2024

Monitor Logs from command line

Hello all, is there any command line tool to monitor of an endpoint without opening up the webpage? Thanks!
Solution:
You can use your browser's console to check the API calls that are made and then use the API to get the logs
Bblistick1/5/2024

What does "throttled" mean?

My endpoint dashboard sometimes shows "1 Throttled" worker, and 0 other workers, except for queued ones. What does the "throttled" status mean, and how do I prevent the condition?
Wwizardjoe1/4/2024

Error building worker-vllm docker image for mixtral 8x7b

I'm running the following command to build and tag a docker worker image based off of worker-vllm: docker build -t lesterhnh/mixtral-8x7b-instruct-v0.1-runpod-serverless:1.0 --build-arg MODEL_NAME="mistralai/Mixtral-8x7B-Instruct-v0.1" --build-arg MODEL_BASE_PATH="/models" . I'm getting the following error:...