Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

How to get around the 10/20 MB payload limit?

For use cases such as training LoRAs with Stable Diffusion, where a user could upload tens of photos, 10/20MB is quite small. This is especially true because you have to convert an image to base64 before sending it to the endpoint, which will increase the size of each photo. My app requires the user to upload photos of themselves for training purposes. And if I can't find a way around the 10 MB payload limit, I just realized I can't use runpod's serverless GPUs. Are there any clever ways of getting around this payload limit?...
Solution:
Upload your photos to cloud storage and your serverless workers can download from a link. The limits are fixed and there is no way around them, you must use a link to download the resources instead.

/runsync/ getting this error - {"Error":"bad request: body: exceeded max body size of 10MiB"}

In my app, I need the user to upload photos of themselves. According to the docs here. The payload capacity for /runsync/ is 20MB https://docs.runpod.io/docs/serverless-endpoint-urls...

webhook gets called twice

running runpod==1.5.2. how can i fix?

Add lora inside a docker image with A1111

Hello, I'm trying to add lora to the docker image created by the runpod team. If somebody knows how it needs to be done, can he checks some lines of my dockerfile? I can't run docker on my PC, so I want to be pretty sure about the code before running it with a friend. The dockerfile already provide the solution to install the checkpoint. I tried to mimic the code to get the same result, but I don't know if I put the file in the right place....

question about the data structure of a serverless endpoint

I have a question about the data structure of a serverless endpoint. I need to build a container with more than 1 model on it, it'll have a network volume with all data. The question is: Where should I store the virtual environments, package dependencies (pyenv and pipenv) and caches ? Storing them on the docker image or the network volume? which will bring better results in terms of performance and execution time ?...

Cold start time

Does anyone know the cold start time of model hosted in serverless runpod? Kandinsky model. 🙂

all 5 workers throttled

What can I do when all 5 of my workers are throttled? It appears that because of this my jobs are every now and then getting stuck for a while.
No description

Tips on avoiding hitting this error whilst checking `/status/:job_id` using requests?

Full Error pasted at the bottom. After sending out a request, I'm using requests.get() to check the /status/:job_id of the endpoint every 3 seconds until the job either returns FAILED or COMPLETE. Unfortunately one of the requests had a particularly long delay time (234 secs), on top of the 57 secs of execution time....

Newbie question

Hello, I would like to try something like a video transcoding server on demand for some projects. I would like to point to a video link, the server get the file, transcode it using ffmpeg and hadware compression, and push it back to a AWS storage (Cloudflare R2). Would it be possible ? Any advice to where to start from to try to do that ? Thanks a lot in advance....

Proper way to listen stream

If I understood correctly, the only way to get stream updates is making request to stream endpoint like it showen in docs here https://docs.runpod.io/reference/llama2-13b-chat. for i in range(10): time.sleep(1) get_status = requests.get(status_url, headers=headers)...
Solution:
we dont have SSE support yet, will plan to look at that

Can we use other SD models (and Loras) on Quick Deploy serverless?

Hello everyone, I tried the quick deploy option for SD Automatic1111. And the speed of generation is very good. I tried to modify the original model to use another one. In the Dockerfile, I replaced the original model with Cyber_Realisic model. I only changed the line where the civitai link is (you can see the picture). When I tried to create the Docker image I got an issue:...
No description

Is it possible to release a new version via command line?

Instead of web interface, can we do this via command line? Thanks!

Increase Worker Max Limit

Hi, I would like to increase my max worker amount. Would it be possible for someone from the team to reach out via DMs? Thank you!

Empty Tokens Using Mixtral AWQ

``` 2024-01-20T00:36:26.942667713Z 2024-01-20T00:36:26.943297221Z ========== 2024-01-20T00:36:26.943372701Z == CUDA == 2024-01-20T00:36:26.943619654Z ==========...
No description

Intermittent Slow Performance Issue with GPU Workers

I am currently encountering an intermittent issue with some GPU workers exhibiting significantly slower performance. I have tried to measure the time taken for a specific task on a designated type of GPU worker (4090 24GB). Typically, when I send the exact identical payload input to the endpoint, the execution time is around 1 minute. However, I have observed that occasionally, a worker becomes exceptionally slow. Even with the same payload input, Docker image, tag, and GPU type, the execution time extends to a few hours. Notably, during these occurrences, the GPU utilization remains constantly at 0%. Upon reviewing the output log, it is evident that the inference speed is unusually slow when the affected worker is in operation. Have any of you experienced a similar problem, and if so, how did you resolve it? Your insights and assistance in addressing this issue would be greatly appreciated. Thank you....

Why is the GPU not full?

I made 4 requests, but only 3 workers are running and 1 is waiting, but my worker limit is 5!

All my serverless instances are "initializing" forever

I am trying with my own template (from the tutorial) and trying to spin up a canned template (the Llama) but I just get an endless "Initiailizing" (see image). Can someone help? This is really holding me back.
No description

is there anyway to restart the worker when SSH into the device

Hey all, I have an reserved instance and I’m debugging some issues running on the GPU. I can ssh into the device and change some code. Is there an easy way for me to restart the worker process? Thanks!

OSError: [Errno 122] Disk quota exceeded

Loading llm 32 gb I have 40 gb container memory 100gb network volume - was setted automatically During llm downloading - it is interrupting with OSError: [Errno 122] Disk quota exceeded...

Does the serverless SD API's have NSFW filter turned on?

Looking to see if the NSFW filter is turned on for the API endpoints.