Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

sdxl

i have confusion that is runpod serverless endpoint charge for whole month or on based on usage
Solution:
based on usage of workers running

Unit for Pricing

I never pay attention to this on the internet: what is the unit here? second or minute?
Solution:
No description

error downloading model? TheBloke/Mixtral-8x7B-MoE-RP-Story-AWQ

2335.9 Traceback (most recent call last): 2335.9 File "/download_model.py", line 48, in <module> 2335.9 tokenizer_folder = download_extras_or_tokenizer(tokenizer, download_dir, revisions["tokenizer"]) 2335.9 File "/download_model.py", line 10, in download_extras_or_tokenizer 2335.9 folder = snapshot_download(...

About Queueing

For the jobs that stay in the queue for whatever reason (all pods already running a job probably), - do those jobs stay in the queue for a limited time or is it unlimited. - Is there any limit for number of items in the queue or is it unlimited. If there are limits what are they and can it be changed in any way?...
Solution:
jobs stay in queue as defined by ttl policy, you can change it, by default its 24 hours

Network Storage Cache

In serverless invocation; is it possible to store files in network storage and have those items be accessible between worker invocations? Or is the runpod volume directory a copy of the network volume...
Solution:
You can attach your network volume to your serverless endpoint under Advanced settings. It gets mounted to the serverless workers at /runpod-volume.

About volumes and images

Hi All, Great product so far! I have a few questions (apologize if it's already somewhere in the docs, haven't found it). I'm also not sure if this should go here or somewhere else....

Api to Text Generation Web UI

Hello! I want to upload a model using a serverless pod. It will be a Text Generation Web UI. I know I will end up getting an endpoint. I would like to create a few "characters" on there but how can I interact with the model using my code? Is there some type of POST endpoint I need to use to use a certain character and run my model? I am currently using OpenAI GPT4 for this but would like to switch over. ...

network volume venv serverless

When trying to access venv/src pip installs from a network volume, I get moduleNotFound. When the same network volume is loaded on a Pod, it works just fine. Any tips?
Solution:
Venv needs to be loaded from the same directory path i think. u should instead in serverless just bake the dependency into the docker image

Container start command behavior

I am starting a serverless container using image: runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 On the start command I am adding: mv /runpod-volume /workspace && bash /workspace/runpod/pod-startup.sh I exepct the runpod-volume to be renamed to workspace and then run the script, but instead this is what I get:...
Solution:
@papanton Not familiar with what ur trying to do but maybe try to use a symlink instead

Docker image and SD Models

My docker image for ComfyUI is going to contain several SD and SDXL models. Is it best to include those as part of the image or have them downloaded on startup?
Solution:
Part of the setup in the docker image @Phando

Uploading file to serverless

If I want to upload a file to serverless, then what is the way to do this? Somewhere I saw to use intermediate base64 representation. Setting aside the latency issues for encoding and decoding, what if my file is very large? I have to convert to base64 and then append a very large string to my request, which is not very neat. Also as the file grows larger, we would send large strings, is there any throughput loss compared to multidata-format upload?

GraphQL: How to get the runtime of a serverless pod through the api stateless?

Goal is to get the runtime (or call it active time) since start. Constraint is to do it stateless, meaning that persisting timestamps in my environment is not an option. Using graphql https://graphql-spec.runpod.io/#definition-Pod: ...
No description

2x A100 / 3x 48 GB on Serverless

Hi @flash-singh, a while back we talked about having multiple GPUs on serverless and then you introduced 2x 48 GB. Now there are larger models out like Mixtral 8x7B which requires a minimum of 100GB, but ideally 120GB VRAM to serve. Do you have any plans to expand capacity to allow for this in your serverless products? Perhaps, an easier route is to allow 3x 48 GB GPUs since that can serve models like Mixtral....

SGLang worker (similar to worker-vllm)

Recently, some progress has been made for efficiently deploying LLMs and LMMs. SGLang is up to 5x faster than vLLM. @Alpay Ariyak could we port the worker-vllm setup to SGLang? https://github.com/sgl-project/sglang https://lmsys.org/blog/2024-01-17-sglang/...

I need to speak about my credits in my account. Thanks

I need to speak with someone from management/support about my credits in my account. Thanks...

Insanely Fast Whisper

I am trying to get this to work on Runpod to try and eek out some more speed over your faster-whisper which I currently use and love. It seems like this effort was started? https://github.com/runpod-workers/worker-insanely-fast-whisper/tree/main @Justin Merrell @Marut
...

Trying to deploy Llava-Mistral using a simple Docker image, receive both success & error msgs

I am using a simple Docker script to deploy Llava-Mistral. In the system logs, it creates the container successfully. In the container logs, I get the following:
2024-02-05T01:52:10.452447184Z [FATAL tini (7)] exec docker failed: No such file or directory
2024-02-05T01:52:10.452447184Z [FATAL tini (7)] exec docker failed: No such file or directory
Script:...

Worker hangs for really long time, performance is not close to what it should be

Hi, I'm working with a transcription and diarization endpoint. The docker image works great, tested locally and also inside a worker, I ssh into the worker and tested using:
python handler.py --test_input '{"input": {"endpoint": "transcribe_option", "file_path": "dev/tmp/test_files/FastAPI_Introduction_-_Build_Your_First_Web_App_-_Python_Tutorial.mp4", "is_diarization": true}}'
python handler.py --test_input '{"input": {"endpoint": "transcribe_option", "file_path": "dev/tmp/test_files/FastAPI_Introduction_-_Build_Your_First_Web_App_-_Python_Tutorial.mp4", "is_diarization": true}}'
The processing time is around 1 minute for this video (11 min), works great, these are the logs I get from running inside the worker the same reques -> Message.txt appended....
Solution:
Take a look at our implementation of Fast Whisper https://github.com/runpod-workers/worker-faster_whisper/blob/main/src/rp_handler.py Your code is already blocking, async is likely just introducing complexities...
No description

$0 balance in my account

Hi, I had about $25 USD last night in my account. This morning I received a message to replenish my account as it was empty. I would like to understand what happened, as I do not have a running pod or serverless instance. Thanks....
Solution:
It is likely that your worker continuously started up but was unsuccessful, this would have still resulted in your account being charged. If you DM I can provid you a small credit to verify if this was the case

vllm + Ray issue: Stuck on "Started a local Ray instance."

Trying to run TheBloke/goliath-120b-AWQ on vllm + runpod with 2x48GB GPUs: `` 2024-02-03T12:36:44.148649796Z The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache()`. 2024-02-03T12:36:44.149745508Z 0it [00:00, ?it/s]...