RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Can u run fastapi gpu project on serverless runpod?

I have a fastapi project that was hosted on sagemaker. Now i plan to move it to runpod. Can someone guide how to do it?...

Execution Time Greater Than 30000s

why the Execution Time was so long, even greater than 30000s Image, I had to cancel manually Because the task queue is completely unable to run....
No description

Serverless tasks get stopped without a reason

Hey everyone! Im running a serverless function which starts a docker that runs a python program, and for some reason sometimes, while the python software is running, the container gets stopped: 2024-10-27T19:50:00Z start container for xxxxx begin 2024-10-27T19:50:39Z stop container 5f797326f14b2a255f5363623485299b4f911fbba8b8b60e3daf44908c43980f 2024-10-27T19:50:39Z remove container...

Serverless Real-World Billing (Cold Start, Execution, Idle)

I understand that RunPod Serverless compute is billed as: Cold Start Time + Execution Time + Idle Timeout Can you help clarify how this applies in real-world settings with sporadic usage? For an example:...

Cannot load symbol cudnnCreateTensorDescriptor

I encountered this error when I deployed my Whisper code on a serverless environment. What is the recommended image to use for running the Whisper models 'base' and 'large-v3'?
No description

How to send an image as a prompt to vLLM?

Hi there, I am new to Runpod and facing an issue in sending the image to the runpod serverless endpoint. In docs it's mentioned how an image could be received but I want to send it. I am using a Qwen2VL Model which accepts an image and a text prompt. I am able to send text but not the image. Please help me with this. Actually I am doing it for an assignment to be submitted before the deadline. Thank you any help would be appreciated....

Any good tutorials out there on setting up an sd model from civitai on runpod serverless?

I've been trying to set up an sd model from civitai on runpod serverless for days now but i've been coming across too much errors. Each time i fix , thers a new one , classic. Is there any good tutorials out there on setting up an sd model from civitai on runpod serverless ?...

Does VLLM support quantized models?

Trying to figure out how to deploy this, but I didn't see an option for selecting which quantization I wanted to run. https://huggingface.co/bartowski/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF Thanks!

Vllm error flash-attn

I get this error how to fix it and use vllm-flash-attn which is faster. Current Qwen2-VL implementation has a bug with vllm-flash-attn inside vision module, so we use xformers backend instead. You can run `pip install flash-attn to use flash-attention backend.

Frequent "[error] worker exited with exit code 0" logs

Hi! I’m working on a project where I'm using RunPod serverless to run my ComfyUI workflow within a Docker image. I attempted to update my Dockerfile and the ComfyUI workflow JSON to save the generated images to my RunPod network volume, but I keep receiving the following logs at the bottom. Any insights or suggestions on how to resolve this would be greatly appreciated! I’ve attached my Dockerfile for reference, and here’s the relevant part of the ComfyUI output path configuration: ```json "332": {...

Worker frozen during long running process

request ID: sync-f144b2f4-f9cd-4789-8651-491203e84175-u1 worker id: g9y8icaexnzrlr I have a process that should in theory take no longer than 90 seconds ...

Runpod GPU use when using a docker image built on mac

I am building serverless applications that are supposed to be using gpu, while testing locally, the pieces that kick off functions that are meant to be using gpu are denoted with the common: device: str = "cuda" if th.cuda.is_available() else "cpu" this is required so that when running locally on a mac, the cpu device is used. I would think that in a docker image built on a mac, but with a amd64 machine type specified in the build command, that when its deployed on a server that has a cuda base image, cuda gpu would be used. but that does not seem to be the case....

A step by step guide to deploy HuggingFace models?

So I'm looking for serverless options to host public models on HuggingFace for my personal use, but it looks like simply dropping HuggingFace URLs won't actually make it work. It showed the following errors when I tried to send an API request to my OpenAI Base Url:
ValueError: Unrecognized model in BeaverAI/Cydonia-22B-v2l-GGUF. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mimi, mistral, mixtral, mllama, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, paligemma, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, phi3, pix2struct, pixtral, plbart, poolformer, pop2piano, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rwkv, sam, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, siglip, siglip_vision_model, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso, zoedepth\n
ValueError: Unrecognized model in BeaverAI/Cydonia-22B-v2l-GGUF. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mimi, mistral, mixtral, mllama, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, paligemma, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, phi3, pix2struct, pixtral, plbart, poolformer, pop2piano, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rwkv, sam, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, siglip, siglip_vision_model, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso, zoedepth\n
Is there a step-by-step guide for beginners to deploy a HuggingFace models using RunPod's serverless option?...

Request queued forever

Hi, I am facing a problem while interacting with my runpod serverless endpoint. When I send the first request it gets queued and server don't get started, ideally from a cold start it should take 5 mins at max but its not initializing even after 15-20 mins, I have already deleted the ednpoint and created it again, it fixed the issue once but getting the same problem now. I am using docker image with custom tag. The logs say woker is ready, starting container, remove container, it throws no error...

Multi-Region Support and Expansion Plans

Hello, Currently, the serverless worker system distributes containers randomly between the US and EU. I’m wondering if there are any plans to allow assigning a specific number of workers to each region (e.g., x workers in the US and x workers in the EU) under a single endpoint in the future. Additionally, would it be possible to implement automatic routing of requests to the nearest region if this feature becomes available? For instance, if an edge function is called from the EU, it would be ideal to route the request to an EU-deployed worker to reduce latency....

Multiple endpoints within one handler

I have had success creating serverless endpoints in runpod with handler.py files that look like this: imports ... def handler(job):...

How to Minimize I/O Waiting Time?

Hello, I’m using serverless Runpod for ComfyUI, where I send and return image URLs, leveraging the Google Cloud Bucket SDK. My current flow is: Runpod handler downloads the image using the URL....

Image caching

Hi, are there plans to add caching to user images? I have pretty big image (18GB) and after some time it will pull image again even after it was already initialized, which will block my processing pipeline

Network volume used size

How to check used size of network volume in Storage panel?