Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Does RunPod serverless handler support FastAPI?

I am trying to migrate an already existing FastAPI application running ML model to RunPod serverless. Does the serverless handler, that needs to be dockerized, support FastAPI?

What is the meaning behind template on serverless?

I would like to understand how to create a serverless RunPod environment to run vLLM to host a LLM model. What is the purpose of template? Although it's optional, it seems a template can be pre-created and utilised. What is the meaning behind template? Thanks...

Poor upload speed

I am uploading a large files (2GB) from a serverless endpoint in CA-MTL-1. I get consistenly low speeds and timeouts. The GPU is attached to a network volume if that matters. I copy the file from the network volume to the tmp directory before uploading. I measured by upload speed and its about 30MB/s which is low considering the server advertises 1000mbps+ upload speeds...

Costs for increasing worker count

Hello, I wanted to know what the cost of increasing my worker count to 30 is? I know that it requires $100 credit to increase to 10 workers, but what is after that? Is the next upgrade to 15 workers or is it to 20? And how much would that cost?...

Expose S3 boto client retry config for endpoints (Dreambooth, etc)

Note: posting this here since I can't in "feature requests" section. I'm currently experiencing (and also have in the past) rate-limiting issues when trying to upload the result of a job to my S3 bucket. In my case, it is with Backblaze B2 and the infamous "ServiceUnavailable, no tomes available" error, but this has also happened with other providers. Problem: When using the Dreambooth endpoint with an S3 bucket setup to upload the trained model, sometimes the S3 service fails due to rate limiting and the whole job ends in error, losing all training. AFAIK, this is expected behavior and the caller is supposed to retry the request with an exponential backoff....

Costs have skyrocketed

Please explain how my costs have skyrocketed while nothing has changed with my endpoints and my usage has gone down if anything.
No description

Logging stoppas at a random point

The worker runs just fine, but after it finishes I can not see all the debug info prints that I put in my code. Seems like it stops logging at a random point.....saw something about logging dropped if too verbose, could that be it?

Hosted Serverless

I looked at your website and I am not sure if you host models in a multi-tenant style. I want to be able to call some common models like Mixtral at low cost without paying for a dedicated server. Do you offer this?

A way to know if worker is persistent ("active") or not

It would be a great help if there was some way from inside the code (e.g. environment variable) to know if the model is running on a persistent worker or not. Example use-case - If the worker is persistent, I can compile the model, it takes ~20 minutes but it is worth it so my users can get an almost 50% latency reduction. However, you can understand it is impossible to do for an ephemeral worker as it will take too long to initialize. Is there any way to do this?...

Comparing Costs: Single vs. Dual GPU Configuration in Serverless Computing

If I opt for a serverless 48 GB GPU and choose 2 GPUs per worker, will the cost be the same as if I chose 1 GPU per worker?

Can't set up the serverless vLLM for the model.

Please help solve the problem. When trying to make a request, these errors are logged: ▲ 2024-04-24 18:25:10.089 [hrkxm58yz2r504]...
No description

error creating container: create or lookup container: container create: exit st

I keep getting this error on my Serverless pods. I didn't change change anything compared to previous days. FYI, I am hosting my models on AWS ECR. 2024-04-25T17:16:52Z error creating container: create or lookup container: container create: exit status 1...

API not properly propping up?

Hi I'm new in Runpod. I deployed a finetined LLM using the vLLM template in Runpod. I'm having problems with the API, when I fire requests using the OpenAI chat completions API it gets stuck processing the request for a couple of minutes and returns 500. When I hit the API using the Runpod endpoint console and afterwards hit it again with the request that 500'd previously it works as expected in about 8 seconds. I am doing this without using a Handler, am I doing something wrong?

Can local development use Runpod Secrets?

I discover that runpod serverless has this specify Secret feature. I want to use this to store values like environment variables.
{{ RUNPOD_SECRET_hello_world }}
{{ RUNPOD_SECRET_hello_world }}
...
Solution:
Yeah, as far as I know, we don't provide support for this, but setting environment variables locally shouldn't be too hard?

How does the vLLM serverless worker to support OpenAI API contract?

I wonder how a serverless worker can implement a custom API contract, if it is mandatory that the request must be a POST and the payload is forced to be a JSON with a mandatory input field. I understand that the vLLM worker (https://github.com/runpod-workers/worker-vllm) solved it, implements OpenAI API endpoints, but I don´t get how it bypassed these limitation....

serveless webhook sometimes does not receive requests

When using serveless, I set up the webhook callback address, but my business system occasionally doesn't receive a callback request from serveless. Can I see the logs about webhook in runpod's backend.
No description

No active workers after deploying New Release

Had 5 active workers. Deployed new release, which was quickly pulled. Shortly afterwards all workers went to "Initializing" state, fully shutting down the endpoint. Would expect some workers to stay active so the endpoint can handle requests. This is not the first time that this happened. As of now it is not stable to use this feature on production pods....
No description

stuck in "stale worker", new release with new image tag not deploying "latest worker"

I've noticed this problem several times. Sometimes, I pushed a new containter image version, I click "New release" and input my new container with new version tag, but it won't fire "latest workers", instead, it just shows "stale workers 2" and stuck in there forever. as you can see in the screenshot, i already bumped the image to "weixuanf/runpod-worker-comfy:nslog15" but it still stuck in stale workers using "weixuanf/runpod-worker-comfy:nslog13" To work around this, if I change the number of max workers in "Edit endpoint", it may fire new "latest workers" being initializing sometimes ( not always)...
No description

use auto1111 with my own sdxl-lightning models

Hi, I wanted to install the auto1111 sd serverless worker with my own custom model. What is the best way to achieve it? Can I just simply install the preset and somehow upload the model to the worker (if yes, how?) or do I need to clone the repo and change some settings etc. What is the best way to do this? Thanks in advance

Can anyone help me setting up serverless endpoint?

How can we test our runpod endpoint by runpod API. Any pthon script available?