Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

2/18/2024

Serverless Unable to SSH / Use Jupyter Notebook Anymore

When I used to use Runpod when I first started, if I had an active worker, I could ssh / use a jupyter notebook if I had ssh open / notebook launched on the pod. But now when I try to ssh, it just throws me an error: ``` Justins-MBP ~ % ssh m3k8sad75isko8-64410faa@ssh.runpod.io -i ~/.ssh/id_ed25519...

2/18/2024

Editing Serverless Template ENV Variable

When I edit a serverless template env variable, does it update in real time? Just wondering, I sort-of can't tell, but wondering what is happening under the hood. Do I need to refresh the workers myself, or is it when Idle workers > go active, will autograb new env variables?

Bell Chen

2/17/2024

Worker's log is not updating in real time. It only pulls the log every 5 mins..

Endpoint: 0bd8xndlkfo6oj

pazanchick

2/17/2024

llama.cpp serverless endpoint

https://github.com/ggerganov/llama.cpp
llama.cpp is afak the only setup that supports llava-1.6 quantized, that's why i use it. On some workers the docker image works, on others "illegal instruction" error and crash. https://github.com/ggerganov/llama.cpp/issues/537...

Solution:

I don't know why you would want to use llama.cpp, its more for offloading onto CPU than for GPU. You can look at using this instead: https://github.com/ashleykleynhans/runpod-worker-llava...

doc

2/17/2024

comfyui + runpod serverless

I'm looking to host my comfyui workflow via runpod serverless. I'm curious how does the comfyui startup process work with serverless. For example, in my local setup, everytime I restart my comfyui localhost, it takes awhile to get up and running, let's call this the "comfyui cold start". But once it is setup, it's relatively quick to run many generations one after another. My Question: ...

2/17/2024

ECC errors on serverless workers using L4

We are currently using L4 machines in the eu-ro region for our production environment(30~70 workers). Based on the requests data, we have seen increasing hardware issues related to ECC errors and was wondering if we could get help in mitigating these failures.

``
"handler: CUDA error: uncorrectable ECC error encountered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with

TORCH_USE_CUDA_DSA` to enable device-side assertions....

2/17/2024

Does Runpod Autoupdate Images now for non-matching hashes?

I had only idle workers, and I sent a request to do some testing, suddenly, it started downloading a new image??? The only explanation I have of this is I have a CI/CD pipeline I'm testing that pushed a new image up with the same name. Is runpod just downloading new images now if the hashes don't match? You can see it b/c instead of being in "initializing" its a green worker....

2/17/2024

VllM Memory Error / Runpod Error?

https://pastebin.com/vjSgS4up

Error initializing vLLM engine: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (24144). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.

Error initializing vLLM engine: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (24144). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.

...

wizardjoe

2/16/2024

How do I correctly stream results using runpod-python?

Currently, I'm doing the following: ------- import runpod...

kingclimax7569

2/16/2024

Status endpoint only returns "COMPLETED" but no answer to the question

I'm currently using the v2/model_id/status/run_id endpoint and the results I get is follows: {"delaytime": 26083, "executionTime":35737, "id": **, "status": "COMPLETED"} My stream endpoint works fine but for my purposes I'd rather wait longer and retrieve the entire result at once, how am I supposed to do that? ...

Solution:

Okay… 1) What is deployed to runpod is: https://github.com/hommayushi3/exllama-runpod-serverless/blob/master/handler.py ...

ashleyk

2/16/2024

24GB PRO availability in RO

I switched from 24GB tier in RO to 24GB PRO to benefit from the higher availability of the 4090's in RO, but most of my workers are becoming throttled again.

Superintendent

2/16/2024

Deepseek coder on serverless

Hello, new serverless user here, i would be using the vllm worker, so whenever it gets spun up from a coldstart, does it have to download the model everytime? Id be running it in fp16 which means it be about 14gb of data to download

DREAMPRESS

2/16/2024

How to write a file to persistent storage on Serverless?

Hey guys, can someone help me write a file to persistent storage on Serverless, I want to then allow users to download it directly from the storage, and clean up the volume after 24 hours. Any help here would be great!!...

TumbleWeed

2/16/2024

Run LLM Model on Runpod Serverless

Hi There, I have LLM Model which build on docker image and it was 40GB++ docker Image. I'm wondering, can I mount the model as volume instead of add the model in the docker image?...

Hello

2/16/2024

Safetensor safeopen OS Error device not found

Running inference on severless endpoint and this line of code:

with safetensors.safe_open(path, framework="pt", device="cpu") as f:

with safetensors.safe_open(path, framework="pt", device="cpu") as f:

...

Bell Chen

2/15/2024

L40 and 6000 Ada serverless worker not spawning

It is not spawning

kdcd

2/15/2024

Directing requests from the same user to the same worker

Guys, thank you for your work. We are enjoying your platform. I have the following workflow. On the first request from the user, the worker does some hard stuff about 15-20s, caches hard stuff and all subsequent requests are very fast ~150ms. But if some of the subsequent requests goes to another worker, it should repeat this hard stuff again (15-20s). Is there any possibility to direct all the subsequent calls from the same user to the same worker?...

Solution:

Just a summary so I can mark this solution: 1) Can use network storage to persist data in between runs 2) Use a outside file storage / object storage provider 3) If using Google cloud / S3 Bucket, for large files can use parallel downloads / uploads; there should be existing tooling out there; or can obvs custom make ur own...

mattevans

2/15/2024

Serverless webhook for executionTimeout

Hi, We've just added an executionTimeout for our serverless jobs. I understand that when you supply a webhook, a request is sent when a job is completed. Is it possible to send a webhook request when the executionTimeout is hit as well? Ideally we want to update our db when a job is complete or has failed (due to taking too long)...

Mandelion

2/15/2024

Is there any way to do dynamic batching?

Say I have a vision model deployed and I send 5 images within x time is there a way to actually stack the images, pass them through the model and return the 5 responses? I was able to find concurrent handlers etc. but nothing actual batching (other than sending them all in the same request of course)

dudicious

2/15/2024

Started getting a lot of these "Failed to return job results" errors. Outage?

```json { "dt": "2024-02-15 08:20:07.490148", "endpointid": "1o6zoaofipeyuh", "level": "error",...

Previous Next

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!