Topics

All posts for RunPod

Stuck when run is triggered via API call but not on dashboard?
No Space Left on Device /var/lib/docker/tmp reported during Worker Initialization
Job stucks at Queue for 10 mins
Questions About Running ComfyUI Serverless on RunPod
The worker is always 0 even when there are requests in the queue.
ComfyUI: "Failed to connect to server at http://127.0.0.1:8188 after 500 attempts" on serverless
US-NC-1 Failing to pull images
Billing question
stuck at waiting for build
Serverless instances are not assigned GPUs, resulting in job error in Production. Require Assist
How to know which graphics card worker was ran on?
Has anyone successfully deployed a serverless instance using wan2.1 to generate i2v?
type must be one of the following values: QUEUE, LOAD_BALANCER when i clone an endpoint
serverless instances having issue with caching container image?
Question about Serverless V2 API Payload for Automatic1111 Inpainting
Issue with Websocket latency over serverless http proxy since runpod outage
Runpod down?
On-Demand vs. Spot Pod
Unable to access Custom Container & Community Templates
Ai Malware detection
Facing a Serverless run error that is not encountered in ComfyUI web UI.
Bad requests
Run a function once when a worker starts
Experiencing massive growth of startup/execution time from ~ 22:00 UTC April 17th
Comfyui serverless via fastapi python to generate an image
Serverless VLLM concurrency issue
Bad requests
Workers are Stuck on Booting Fetching Image - Anyone Else?
Serverless vLLM changing engine arguments
Jobs stuck as no workers are being initialized
Giga delay time
ComfyUI Job Failed
why the hell are my delay times so high and im bearing all the costs??
Serverless endpoints dissapeared
Intermittent error No space left on device
Workers throttled while processing request
How to return bytes from a serverless endpoint?
It seems that serverless does not have an option to customize the CPU?
Response code 502\520\400
Serverless endpoint fails with CUDA error
The serverless endpoint times out after 600 seconds, even though the timeout is set to 3600 seconds
Can I look at all workers in serverless endpoint, review latest completed request & delete them
How to configure a one-to-one mapping of client connection to worker/GPU instance
Serverless SGLang spent credits on phantom requests
Serverless Requests Queuing Forever
Serverless endpoint fails with Out Of Memory despite no changes
Getting 401 during image push for serverless, when built from gitrepo
I am trying to connect Facefusion
How long does it take to build?
GPU not detected on RunPod serverless - HELP!!
worker exited with exit code 0
Help! Why do some of my workers report insufficient space when pulling images?
DeleayTime beeing really high
Does "/runsync" return IN_PROGRESS if it doesn't complete with 2 minutes?
How to deploy Multi-Modal Model on Serverless
Error when building serverless endpoint
Can runpod bringup nodes faster than aws/gke ?
Buil docker with environment variables
Unable to deploy my LLM serverless with the vLLM template
Hi!Currently, the serverless service I created keeps initialzing. Is this normal?
Fastest cloud storage access from serverless?
Hi, I'm new to runpod and try to debug this error
Length of output of serverless meta-llama/Llama-3.1-8B-Instruct
I am trying to deploy a "meta-llama/Llama-3.1-8B-Instruct" model on Serverless vLLM
Rag on serverless LLM
Unexpected Infinite Retries Causing Unintended Charges
Serverless vLLM workers crash
Meaning of -u1 -u2 at the end of request id?
Ambiguity of handling runsync cancel from python handler side
Enabling CLI_ARGS=--trust-remote-code
CUDA profiling
Serverless handler on Nodejs
RunPod Serverless Inter-Service Communication: Gateway Authentication Issues
Runpod ComfyUI Serverless Huggingface Models does nothing
Serverless ComfyUI -> "error": "Error queuing workflow: HTTP Error 400: Bad Request",
Error 404 on payload download.
Failed Faster-Whisper task
Delete Serverless Endpoint via the API?
Terminate worker
Is it possible to response with Transfer-Encoding: Chunked
disk quota exceeded serverless runpod github
Ollama serverless?
Serverless docker image deployment
Can you now run gemma 3 in the vllm container?
"Max Retries Reached"
"Something went wrong" trying to create a new endpoint
Faster-Whisper output "None" — log 400 "Bed request"
Can someone help me integrate a JS docker endpoint that executes FFMPEG?
Anyone get vLLM working with reasonable response times?
Network is still PAINFULLY slow
⚠ Hundreds of unexplained requests coming in
Need help with hosting a vton model on serverless
roll out progress taking a while
Can't access private Google Artifact Registry (gcr.io) images
How to optimize batch processing performance？
Do you cache docker layers to avoid repulling ?
Using Multi process pool with serverless cpu sometime cannot stop.
Using serverless to train on a face
Long-running Serverless Requests on Runpod Execute Twice, Doubling Billing Costs
Use SDK to create Network Storage Volumes for Serverless Endpoints
Historical jobs
How to retrieve account spends using GraphQL
Runpod Servelerss really unreliable, delay time is way too high sometimes
Worker other than Python
How to deploy a custom model in runpod?
Build fail:"code":"BLOB_UNKNOWN"
400 Errors with allenai-olmocr on Serverless SGLang - Need Payload Help!
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda
Can't get Warm/Cold status
My job is in queue for over an hour
Serverless Deployment runpod request Issue
how can I check the logs to see if my request uses the lora model
how to run a quantized model on server less? I'd like to run the 4/8 bit version of this model:
Troubles with answers
Adding parameters to Docker when running Serverless
Serverless git integration rollback
Async workers not running
Docker login to a specific registry
suggestion to create templates for repositories
Large delay time even with multiple available workers
CPU pod network volume
Unexpected Charges on serverless h100 80gb
Creating serverless instance
fail: timeout ,exporting to oci image format. This takes a little bit of time. Please be patient.
Getting executiontimeout exceeded
Image build from github works fine but when i test with a request i get an error
Updated workers to 10, now stuck in a loop
Webhooks Stopped Working?
30 minutes pending in serverless
Is there a maximum Runtime?
EUR-IS datacenter blacklisted by Elevenlabs?
queue delay times
serverless qwen-audio model deployment, can't see any error, getting workers exited with exit code 1
How to Speed Up S3 Upload or Make it Async in RunPod Serverless Deployments
output is undefined on response
Locally testing a worker where the consuming code relies on the job ID
Multiple models in a single serverless endpoint?
Custom nodes comfyui serverless
Keeping idle workers alive even without any requests.
How can I validate the network storage is being used for my serverless endpoint?
No Logs when build Failed
Help with deploying WhisperX ($35 bounty)
How can I connect my code to runpod gpu with api
do we get billed partially or rounded up to the second?
Max workers increase
Runpod workers getting staggered when I call more then 1 at a time.
Feb 20 - Serverless Issues Mega-Thread
Default Execution time out
Gpu hosting with API
Job Stuck in Queue Eventhough worker is ready
us-tx3 region cannot spin up new worker
Builds are slower than ever & not showing up Logs at all
Workers stuck at initializing
Avoiding hallucinations/repetitions when using the faster whisper worker ?
Serverless Docker tutorial or sample
Baking model into Dockerimage
Facing Read timeout error in faster whisper
Seems like my serverless instance is running with no requests being processed
Flashboot not working after a while
Why isn't RunPod reliable?
serverless - lora from network storage
stuck in cue
Costs
hey we have serverless endpoints but we have no workers for more than 12 hours now !
[Solved] EU-CZ Datacenter not visible in UI
Does Runpod serverless GPU's support NVIDIA MIG
my serverless worker is downloading models to `/runpod-volume/.cache/huggingface` by itself
Github Serverless building takes too much
Websocket Connection to Serverless Failing
Pulling from the wrong cache when multiple Dockerfiles in same GitHub repo
Severless confusion
How to pass parameters to deepseek r1
Job stuck in queue and workers are sitting idle
Endpoint/webhook to automatically update docker image tags?
What is expected continuous delivery (CD) setup for serverless endpoints for private models?
InvokeAI to Runpod serverless
Comfyui From pod to serverless
Is serverless Network Volume MASSIVE lag fixed ? Is it now usable as a model store ?
Serverless with network storage
Workers keep respawning and requests queue indefinetely
Handler output logs
The default steps on the website for serverless create broken containers that I am charged for.
GraphQL Issue
You do not have permission to perform this action.
vLLM serverless output cutoff
"worker exited with exit code 1" in my serverless workloads
"Error decoding stream response" on Completed OpenAI compatible stream requests
GitHub builds failing "Unable to acquire machine, please retry"
Deployed deepseek-ai/DeepSeek-R1-Distill-Llama-8B on Serverless
Setting up CD for serverless endpoint
Why serverless endpoints try to repull from container when doing inference?
need help getting better gpus
Can I increase max workers beyond 10?
Why the serverless downloading instead of "running" when i trigger the runpod id?
openai/v1 and open-webui
Max image github repo serverless intergration can take?
Job Never Picked Up by a Worker but Received Execution Timeout Error and Was Charged
Serverless worker keeps failing
Started getting errors connecting to google cloud storage
OSError in vLLM worker; issues when its new update was released
Can’t make Qwen/Qwen2.5-VL-3B-Instruct model work on serverless
Whitelist IP Addresses
How much does it cost to use multi-GPU ?
I am not able to hit the api in serverless ollama server llama3.2 model , Here is the screenshot
Process group has not been destroyed before destruct ProcessGroupNCCL, Leaked shared_memory object
Serveless UI broken for some endpoints
Need help in fixing long running deployments in serverless vLLM
A job start in a worker and seems to be relaunch in another worker.
delayTime representing negative value
Serveless quants
DeepSeek R1 Serverless for coding
In Faster whisper serverless endpoint, how do i get english transcription for tamil audio
Stuck vLLM startup with 100% GPU utilization
How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1
worker-vllm not working with beam search
All GPU unavailable
/runsync returns "Pending" response
Kicked Worker
Possible to access ComfyUI interface in serverless to fix custom nodes requirements?
How to truly see the status of an endpoint worker?
How do I calculate the cost of my last execution on a serverless GPU?
Serverless deepseek-ai/DeepSeek-R1 setup?
what is the best way to access more gpus a100 and h100
Guidance on Mitigating Cold Start Delays in Serverless Inference
A40 Throttled very regularly!
SSH info via cli
Can not get a single endpoint to start
All 16GB VRAM workers are throttled in EU-RO-1
worker-vllm: Always stops after 60 seconds of streaming
It is always getting queued whenever I call API queue always get bigger, how to cancel all jobs
I want to deploy a serverless endpoint with using Unsloth
--trust-remote-code
Is there any "reserve for long" and "get it cheaper" payment options?
llvmpipe is being used instead of GPU
1s delay between execution done and Finished message
Can we get our serverless worker limit increased?
Serverless is Broken
EU-RO-1 region severless H100 gpu not available ....
Workers wrongfully reported as "idle"
"Throttled" and re-"Initializing" workers everywhere today
how to run flux+lora on 24 GB Gpu through code
Queue waiting 5+ minutes with dozens of idle workers
Serverless H200?
using compression encoding for serverless requests
Throttled ECR Download?
Need some help to troubleshoot a configuration of a Serverless
Do Webhook Request Responses have a retry mechanism?
Incorrect billing
Request getting stuck
Serverles endpoint status and runsync not returning data anymore in request body (request not found)
I want to increase/decrease workers by code or script, can you help? (GraphQL)
Support for https://huggingface.co/deepseek-ai/DeepSeek-V3?
Serverless Idle Timeout is not working
Flashboot meaning?
Distributed inference with Llama 3.2 3B on 8 GPUs with tensor parallelism + Disaggregated serving
job timed out after 1 retries
Can't see Billing beyond July
Linking runpod-volume subfolder doesn't work
How do we use serverless to train flux Lora for face? i am currently replicate's ostris ai-toolkit t
ComfyUI Image quantity / batch size issue when sending request to serverless endpoint
Some basic confusion about the `handlers`
Next js app deploy on Runpod
Optimizing VLLM for serverless
no compatible serverless GPUs found while following tutorial steps
How to monitor the LLM inference speed (generation token/s) with vLLM serverless endpoint?
When a worker is idle, do I pay for it?
Error starting container on serverless endpoint
How to Deploy VLLM Serverless using Programming Language
Recommended DC and Container Size Limits/Costs
How is the architecture set up in the serverless (please give me a minute to explain myself)
Best way to cache models with serverless ?
Job response not loading
All of a Sudden , Error Logs
Serverless upscale workflow is resulting in black frames.
Failed to load docker package.
Serverless SGLang - 128 max token limit problem.
Too big requests for serverless infinity vector embedding cause errors
Cannot send request to one endpoint
Settings to reduce delay time using sglang for 4bit quantized models?
How to make api calls to the endpoints with a System Prompt?
Serverless GPUs unavailable
Where to find gateway level URL for serverless app
Attaching network volume with path inside pod
Running worker automatically once docker image has been pulled
Mail provider
Build can't find requirements.txt
Efficient serverless release with image caching
Huggingface space on Serverless. How to get the Gradio API string which is the same as Worker ID?
Has the issue of slow loading models from network volumes been resolved?
Environment Variables Crossing Serverless Endpoints
Charge of 50 USD failed cause I don't have enough money. Balance is 99USD. Do I need to recharge?
Embedding Model error
hipaa compliance
runpodctl project deploy issue, i make file changes they aint syncing
leaked shared_memory error
https://github.com/runpod-workers/worker-stable_diffusion_v1
How exactly does serverless pricing work?
Worker not executing job
Reusing containers from Github integration registry
embeddings endpoints
Delay time even when there are many workers available
Runpod serverless for Comfyui with custom nodes
How to deploy ModelsLab/Uncensored-llama3.1-nemotron?
Almost no 48GB Workers available in the EU
GitHub integration: "exporting to oci image format" takes forever.
vllm worker OpenAI stream
Trying to work with: llama3-70b-8192 and I get out of memory
Incrase serverless worker count after 30.
Consistently timing out after 90 seconds
Upload files to network storage
Serverless problems since 10.12
Git LFS on Github integration
Using runpod serverless for HF 72b Qwen model --> seeking help
Docker Image EXTREMELY Slow to load on endpoint but blazing locally
Constantly getting "Failed to return job results."
Why is my serverless endpoint requests waiting in queue when theres free workers?
Github integration
Is VLLM Automatic Prefix Caching enabled by default?
vllm worker OpenAI stream timeout
VLLM model loading, TTFT unhappy path
can't pull image from dockerhub
serverless socket.io support
Running llama 3.3 70b using vLLM and 160gb network volume
0 worker
I don't know my serverless balance goes down
Structure of "job" JSON
Automatic1111 UI with serverless stable diffusion
Serverless github endpoint stuck at uploading phase
Best Practice for SAAS
Serverless Workers redis client?
Serverless request returns None from python client but web status says completed successfully
Template id missing in serverless dashboard
Disk size when building a github repository as an image on Serverless
How to get progress updates from Runpod?
How can I use Multiprocessing in Serverless ?
Can't make serverless endpoints from GHCR container with new Runpod website update
Serverless vllm running but still downloading?
Can anyone help me deploy a qwen/qwq-32B-Preview model from huggingface with vllm serverless
New vllm Serverless interface issue
With new pre-built serverless images how do we learn the API schema?
drained of my funds somehow. HELP??
vllm +openwebui
Has anyone experienced issues with serverless /run callbacks since December?
You do not have permission to perform this action.
Not getting 100s of req/sec serving for Llama 3 70B models with default vLLM serverless template
CPU Availability in North America?
Serverless run time (CPU 100%)
Custom vLLM OpenAI compatible API
How to cache model download from HuggingFace - Tips?
ComfyUI stops working when using always active workers
is it possible to send request to a specific workerId in a serverless endpoint?
Error response from daemon: --storage-opt is supported only for overlay over xfs with 'pquota' mount
Polish TAX ID invoices
How to cancel request
What is the normal network volume read speed? Is 3MB/s normal?
Pods not getting started
First runs always fail
RunPod GPU Availability: Volume and Serverless Endpoint Compatibility
How long does it normally take to get a response from your VLLM endpoints on RunPod?
This server has recently suffered a network outage
serverless health
Monitoring Queue Runpod
Why dont have any thing A100 or H100 now :(.
Need help *paid
Runpod requests fail with 500
Upgrade faster-whisper version for quick deploy
LoRA path in vLLM serverless template
Wish to split model files with docker, but it slows down significantly when using storage
Intermittent timeouts on requests
"Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/91gr..."
Why it can be stucked IN_PROGRESS?
Why when I try to post it already tags it Solved?
HF Cache
GPU Availability Issue on RunPod – Need Assistance
job timed out after 1 retries
Unable to fetch docker images
Failed to get job. - 404 Not Found
vLLM override open ai served model name
Not using cached worker
What are ttft times we should be able to reach?
80GB GPUs totally unavailable
Not able to connect to the local test API server
What methods can I use to reduce cold start times and decrease latency for serverless functions
Network volume vs baking in model into docker
Jobs Stays in In-Progress for forever
How to Get the Progress of the Processing job in serverless ?
Rundpod serverless Comfyui template
Why is Runsync returning status response instead of just waiting for image response?
Worker Keeps running after idle timeout
May I deploy template ComfyUI with Flux.1 dev one-click to serverless ?emplate
What is the real Serverless price?
Can't find juggernaut on list of models to download in Comfy UI manager
comfy
Incredibly long startup time when running 70b models via vllm
Mounting network storage at runtime - serverless
Serverless fails when workers arent manually set to active
Chat completion (template) not working with VLLM 0.6.3 + Serverless
qwen2.5 vllm openwebui
🚨 All 30 H100 workers are throttled
Rope scaling JSON not working
First attempt at serverless endpoint - "Initializing" for a long time
(Flux) Serverless inference crashes without logs.
Same request running twice
Why is 125M from facebook loading into VLLM quickdeploy even though another model is specified?
serverless workers idle but multiple requests still in the queue
Question about serverless vllm endpoint
Serverless pod tasks stay "IN_QUEUE" forever
not getting any serverless logs using runpod==1.6.2
Add Docker credentials to Template (Python code)
Format of video input for vLLM model LLaVA-NeXT-Video-7B-hf
How to view monthly bills for each serverless instance?
Issue with KoboldCPP - official template
How to give docker run args like --ipc=host in serverless endpoints
Is Runpod's Faster Whisper Set Up Correctly for CPU/GPU Use?
Endpoint initializing for eternity (docker 45 Gb)
request cannot running,infinite delay
Llama-3.1-Nemotron-70B-Instruct in Serverless
Job delay
How to get `/stream` serverless endpoint to "stream"?
jobs queued for minuets despite lots of available idle worker
Request stuck because of exponential backoff, what does it mean?
in serverless CPU, after upgrading to runpod sdk 1.7.4, getting lots of "kill worker" error.
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
Delay times on requests
just got hit with huge serverless bill
Can u run fastapi gpu project on serverless runpod?
Execution Time Greater Than 30000s
Serverless tasks get stopped without a reason
Serverless Real-World Billing (Cold Start, Execution, Idle)
Cannot load symbol cudnnCreateTensorDescriptor
Error downloading docker image for custom image mode(juggernaut)
How to send an image as a prompt to vLLM?
Any good tutorials out there on setting up an sd model from civitai on runpod serverless?
Does VLLM support quantized models?
Vllm error flash-attn
Frequent "[error] worker exited with exit code 0" logs
Worker frozen during long running process
Runpod GPU use when using a docker image built on mac
A step by step guide to deploy HuggingFace models?
Request queued forever
Multi-Region Support and Expansion Plans
Multiple endpoints within one handler
How to Minimize I/O Waiting Time?
Image caching
Network volume used size
Flashboot principles
Thinking of using RunPod
Issues with network volume access
Is there any way to set retries to 0
how can we configure scale type using runpod sdk
Tensor
Migrated from RO to IS
Depoying a model which is quantised with bitsandbytes(model config).
Anyone has a fork of ashleykza/runpod-worker-a1111:3.0.0?
API to remove worker from endpoint - please!
Batch processing of chats
job timed out after 1 retries
Pod crashing due to 100 percent cpu usage
Service not ready yet. Retrying...
Asynchronous serverless endpoint failing with 400 Bad Request
What environment variables are available in a serverless worker?
Worker keeps running after finishing job, burning money?
Who to message to increase 80GB serverless endpoint to 3 GPUs/worker instead of 2?
RunPods Serverless - Testing Endpoint in Local with Docker and GPU
is runpod serverless experiencing issues?
How to go about applying for Runpod's creator program?
Initializing...
Connection timeout to host
No container logs, container stopped, worker unhealthy.
Streaming LLM output via a Google Cloud Function
Serverless and Azure.
Testing Endpoint in Local with Docker and GPU
Chat template error for mistral-7b
When are multiple H100s cores on a single node available ?
H100 NVL
RunPod Header Timing is Off
Jobs randomly dropping - {'error': 'request does not exist'}
Huge sudden delay times in serverless
Testing Async Handler Locally
OpenAI Serverless Endpoint Docs
Will there be a charge for delay time?
Some serverless requests are Hanging forever
Application error on one of my serverless endpoints
Job retry after successful run
Why too long delay time even if I have active worker ?
Keeping Flashboot active?
Hugging face token not working
Pod stuck when starting container
Local Testing: 405 Error When Fetching From Frontend
Automatic1111 upscaling through API
Is Quick Deploy (Serverless) possible for this RoBERTa model?
Can we run Node.js on a Serverless Worker?
Microsoft Florence-2 model in serverless container doesn't work
Terrible performance - vLLM serverless for MIstral 7B
New release on frontend changes ALL endpoints
Endpoints vs. Docker Images vs. Repos
Serverless Streaming Documentation
Serverless or Regular Pod? (How good is Flashboot?)
Errors in container
🚨 HELP! Struggling with Super Slow Docker Pulls from Azure ACR on Runpod 🚨
Reporting/blacklisting poorly performing workers
Flashboot not working
How can I make a single worker handle multiple requests concurrently before starting the next worker
Flux.1 Schnell Serverless Speeds
Job timeout constantly (bug?)
Can we use any other container registery than docker hub to deploy on serverless?
where should I put my 30GB of models?
Serverless vllm - lora
Job suddenly restarts and fails after one retry.
how to make max tries to 0 in serverless handler or endpoint UI
Serverless instances die when concurrent
trying to attach network volume to CPU serverless worker
My serverless worker is stuck in initializing.
Callback Function
Avoid automatically terminating tasks for CPU pods
serverless error
open ai retries the request
Stuck IN_PROGRESS but job completed and worker exited
Issue while running FastAPI TTS(Text To Speech) Docker Image on Serverless
My output is restricted to no of tokens
Runpod keeps removing my container after deployment PLEASE HELP
524 Timeouts when waiting for new serverless messages
Receiving "CUDA error: no kernel image is available for execution on the device" error on Serverless
Fixed number of Total Workers - Any work around?
6x speed reduction with network storage in serverless
API to check the template
Sharing a pod template
Flux.1 Dev Inference
Move a Pod to Serverless?
AWS ECR Registry Authentication
Comfyui serverless API
Bad pods on Serverless
Error creating Qwen2-VL model does not support type "qwen2-vl"
How Can I configure Serverless endpoint to always point to latest image?
Llama-70B 3.1 execution and queue delay time much larger than 3.0. Why?
Issue with Multiple instances of ComfyUI running simultaneously on Serverless
Possible to provision network volume programmatically?
Error with the pre-built serverless docker image
How to use environment variables
job timed out after 1 retries
Serverless vLLM deployment stuck at "Initializing" with no logs
Serverless rate limits for OpenAI chat completions
How to set up runpod-worker-comfy with custom nodes and models
Discord webhook
HIPAA BAA
Attaching python debugger to docker image
Error requiring "flash_attn"
worker exited with exit code 137
All workers saying Retrying in 1 second.
How can I limit the queue "in progress"?
webhooks on async completion
How to obtain a receipt after making a payment on the RunPod platform?
GGUF vllm
Speeding up loading of model weights
Serverless service to run the Faster Whisper
Assincronous Job
Is there a way to speed up the reading of external disks(network volume)?
One request = one worker
Very slow upload speeds from serverless workers
TTL for vLLM endpoint
Terminating local vLLM process while loading safetensor checkpoints
Can we set public-read with rp_upload?
Error starting container - cpu worker
Training Flux Schnell on serverless
Training flux-schnell model
Creation of a Unhealthy worker on startup, the worker runs out of memory on Startup.
Streaming support in local mode
Creating endpoint through runpodclt
Jobs in queue for a long time, even when there is a worker available
status: "IN_QUEUE" , what can be the issue
Getting slow workers randomly
Collecting logs using API
Problems with serverless trying to use instances that are not initialized
Upgrading VLLM to v0.6.0
Active workers or Flex workers? - Stable Diffusion
I shouldn't be paying for this
Offloading multiple models
Increase Max Workers
generativelabs/runpod-worker-a1111 broken
ComfyUI Serverless with access to lots of models
Stuck on "loading container image from cache"
Get Comfyui progress with runpod-worker-comfyui?
Llama 3.1 + Serveless
Long wait time for Serverless deployments
Random CUDA Errors
RUNPOD - rp_download
Response is always 16 tokens.
How to deal with multiple models?
FastAPI RunPod serverless request format
Mounting network volume into serverless Docker container
Google cloud storage can't connect
data security and compliance certifications (SOC2 type 2, ISO, HIPAA, GDPR)
Urgent: Issue with Runpod vllm Serverless Endpoint
What is vars.RUNNER_24GB?
v1 API definitions?
Error Handling for Synchronous + webhook & Asynchronous Endpoint
Exposing HTTP services in Endpoints through GraphQL
Monitor GPU VRAM - Which GPU to check?
kernel dying issue
Question about delay and execution time billing
Dependencies version issue between gradio and runpod
Serverless Deforum
Video Editing
Execution time discrepancy
Speed
Understanding RunPod Serverless Pods: Job Execution and Resources Allocation
How to force /runsync over 60 secs
CORS issues
Sync endpoint returns prematurely
Is it possible to see logs of a historical job ID?
Implement RAG with vllm API
How to deploy flux.schnell to serveless?
When ttl is not specificed in policy, one gets 500 with {"error":"ttl must be \u003e= 10,000 ms"}
The Workers tab of an endpoint crashed with a frontend error
Pushing a new release to my existing endpoint takes too long
Serverless worker doesn't run asynchronously until I request its status in local development
increase workers
Do I need to base my serverless worker image from the official base image?
Why my docker image used for my serverless endpoint is not updating?
worker keeps dying while training a lora model
Long latencies
Edit endpoint with new docker image
Request time out?
Running a specific Model Revision on Serverless Worker VLLM
How many serverless-GPUs can be scaled maxed?
SGLang
Job has missing field(s): input
meta-llama/Meta-Llama-3-8B-Instruct serverless
With LLM on runpod is there a cost like other providers like tokens and if its serverless
LLAMA 3.1 8B Model Cold Start and Delay time very long
Run task on worker creation
I got time variation in serverless workers, I don't know but every worker used RTX 4090
Ashley Kleynhan's Github repository for ComfyUI serverless no longer available
Best tips for lowering SDXL text2image API startup latency?
Serverless is showing inaccurate inProgress
Avoid model download on docker build
something went wrong *X when creating serverless vllm
Is there any serverless template, or vLLM compatible HF repo for Vision models?
More RAM for endpoints?
Why is the global sdxl endpoint still available? Will it be getting removed soon?
Why it seems like my job isn't assigned to a worker ( even after refreshing)
Serverless container storage
Is RunPod's CPU Endpoint an alternative to GCP's Cloud Run?
Using the vLLM RunPod worker image and the OpenAI endpoints, how can I get the executionTime?
Any limits on execution timeout?
prod
Runpod serverless overhead/slow
Getting an error with workers on serverless
serverless delay time cost
Deploying bloom on runpod serverless vllm using openai compatibility, issue with CUDA?
Confusion with IDLE time
Does Runpod have an alternative to Ashley Kleynhans' github repository for creating a1111 worker?
Slow network volume
Sticky sessions (?) for cache reuse
Timeout Error even if higher timeout it set
async execution failed to run
whisper
Can't run a 70B Llama 3.1 model on 2 A100 80 gb GPUs.
can't run 70b
Error getting response from a serverless deployment
Copy Network volume contents to another.
Charged while not using service
"IN QUEUE" and nothing happeneds
How can I cause models to download on initialization?
Optimizing Docker Image Loading Times on RunPod Serverless – Persistent Storage Options?
Hello
About resources and priority compare with Pod
Is Billing Date a day off?
Workflow works on pods but not comfyui on serverless
Does webhook work when testing locally?
HF_TOKEN question
A100 80GB GPUs unavailable
Are the 64 / 128 Core CPU workers gone for good?
Head size 160 is not supported by PagedAttention
how to set a max output token
Inquiry on Utilizing TensorFlow Serving with GPU in Serverless Configuration
data transfer cost
Getting bind on address error in serverless
CUDA driver initialization failed
Inconsistent 400 Bad Response from sending /run and /runSync.
New release is taking too long.
Error response from daemon: Container is not paused.
The official a1111 worker fails to build
RuntimeError: Found no NVIDIA driver on your system
Is the vLLM worker updated for LLaMA3.1 yet?
How to create network volume in EU-NL and EU-SE regions?
Getting timeout with network volume
Running into this error while running idm-vton on runpod
Help Reducing Cold Start
Is privileged mode possible?
Is there an easy way to take a python flask application as a serverless api hosting on Runpod??
Llama 3.1 via Ollama
Slow docker image download from GCP
Guide to deploy Llama 405B on Serverless?
How does the vLLM template provide an OAI route?
vllm
Serverless worker failing - how do I stop it
Running Auto1111 getting - error creating container: cant create container; net
Why "CUDA out of memory" Today ? Same image to generate portrait, yesterday is ok , today in not.
GPU memory issue
runpod IP for whitelisting for cloud storage
how can I use javascript on worker code
Serverless Always IN_QUEUE?
Serverless doesn't scale
Unused HPC power
connecting a telegram bot to a serverless pod
How to get worker to save multiple images to S3?
Using SSH to debug serverless endpoints
Serverless SDXL Turbo endpoint returning seed inconsistent images
Can we autoscale past 100 GPUs?
S3 uploads have stopped working - despite environment variables set up for template
Lightweight docker image for inference generation.
How to remove endpoint via Python API?
My serverless endpoint threw an error, the queue of jobs didn't get cleared, credit drained
How to update a serverless endpoint with a new version of the docker image?
text generation inference docker image on serverless?
No billing statement
Status "in-queue"
ComfyUI_InstantID/load_insight_face error
Can't use GPU with Jax in serverless endpoint
serverless idle workers billing
Additional delayed time for A100s
How does storage billing work for serverless endpoints?
Load Checkpoints
How to use a volume with serverless endpoints?
retrieving queue position for a specific task in RunPod serverless API
not enough GPUs free
Deploying MIGAN model to Serverless.
Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2...
Stream using ReadableStream (SSE - Server Sent Events)
Failed to return job results
Some worker can't find file "libEGL_nvidia.so.0"
Does /runsync have a timeout?
Higherend GPU Worker Stop Prematurely
What happens during cold start time?
socket.gaierror: [Errno -3] Temporary failure in name resolution
Can I use a golang handler with serverless?
SDXL Dreambooth - can’t load model
Dreambooth training taking very long (even for 1000 steps) [4090]
Trying to load a huge model into serverless
Billed for endpoint stuck in state: Service not ready yet. Retrying...
CUDA out of memory (80GB GPU)
Question about Network Volumes
Serverless is timing out before full load
Pipeline is not using gpu on serverless
Updated example/tutorial for the serverless cache?
Is there any way to disable retrying after crashed
Deploy BART on serverless
Can I select the GPU type based on the base model in python script ?
I want to use this serverless feature. Is there a tutorial?
serverless
network connections are very slow, Failed to return job results.
Connection timeout to host - ping errors
Bug in runpodctl project?
My serverless does not deploy the new releases
Can two serverless endpoint point to the same docker image with different tags?
runpod-python sdk to create serverless endpoint
How can I increase the execution wait time?
Connection timeout to host https://api.runpod.ai/v2
VLLM WORKER ERRROR
error starting: Error response from daemon: Container aa58de3216b8515a3ee78aa46d9102331aaaf6c210a36c
Serverless Hardware equivalent of endpoint
What is meant by a runner?
Kohya-ss on serverless
vLLM serverless throws 502 errors
Error Handling Issue: Updating Response Status in Python’s Runpod
Exposing http ports on serverless
Prevent Extra Workers from appearing
Quantization method
Maximum queue size
LoRA adapter on Runpod.io (using vLLM Worker)
No config error /
Distributing model across multiple GPUs using vLLM
worker no execute
Environment Variable in Serverless
How does the soft check on workers limit work?
Stuck in the initialization
cannot stream openai compatible response out
[URGENT] Failed to return results
Is there an equivalent of flash boot for CPU-only serverless?
Why the available GPUs are only 1?
Faster-Whisper worker template is not fully up-to-date
Slow IO speeds on serverless
How to download models for Stable Diffusion XL on serverless?
0% GPU utilization and 100% CPU utilization on Faster Whisper quick deploy endpoint
Loading models from network volume cache is taking too long.
Are webhooks fired from Digital Ocean?
best architecture opinion
Cancelling job resets flashboot
RUNPOD_API_KEY and MAX_CONTEXT_LEN_TO_CAPTURE
Do I need to allocate extra container space for Flashboot?
When servless is used, does the machine reboot if it is executed consecutively? Currently seeing iss
How can I view a generated image or video ?
unusual usage
Slow I/O
Problem with RunPod cuda base image. Jobs stuck in queue forever
runpod-worker-a1111 and loras
Intermittent connection timeouts to api.runpod.ai
vLLM streaming ends prematurely
Why no gpu in canada data center today?
is there example code to access the runpod-worker-comfy serverless endpoint
Backup plan for serverless network outage
delay time
update worker-vllm to vllm 0.5.0
SDXL Quick Deploy through Runpod Doesn't work
Video processing
can 3 different serverless workers running from same network volume?
Can serverless endpoints make outbound TCP connections?
Very slow cold start times
Uploading a file to network volume takes forever and fails after a few mins
Cannot run Cmdr+ on serverless, CohereForCausalLM not supported
vLLM Serverless error
Environment Variables not found
Musepose serverless error
Does it only accept python language?
How can I connect my network volume to a serverless endpoint?
Requests remain in status "IN QUEUE" and nothing happens
Anyone have example template for OpenVoice V2 serverless?
What quantization for Cmdr+ using vLLM worker?
CPU Instances on 64 / 128 vCPUs FAIL
JS pupeteer issue
Pytorch Lightening training DDP strategy crashed with no error caught on multi-GPU worker
Quick question about what "Extra workers" are
Parallel processing images with different prompt
SDXL Serverless Worker: How to Cache LoRA models
how to deploy custom image gen model on serverless?
Loading LLama 70b on vllm template serverless cant answer a simple question like "what is your name"
Now getting "cors" response from serverless? Never used to has there been a release?
runpodctl project dev , auto intall all dependices on dev venv, but runpodctl project deploy not
flashboot adding cost?
Workers deployed with wrong GPU
More than 5 workers
Money credited for unknown reason
Serverless Instance stuck initializing
Worker logs say this to all requests: API: unreachable.. retrying in 100ms
Workers configuration for Serverless vLLM endpoints: 1 hour lecture with 50 students
JS endpoint?
All pods unavailable | help needed for future proof strategy
Endpoint stuck in init
Bug in cancellation
Where is the "input" field on the webhooks?
Issue loading a heavy-ish (HuggingFaceM4/idefics2-8b) model on serverless (slow network?)
Network bandwidth changes?
GGUF in serverless vLLM
hanging after 500 concurrent requests
is anyone experiencing a massive delay time when sending jobs to GPUs on serverless?
Urgent! all our workers not working! Any network issues?
Send Binary Image with Runpods Serverless
New release will re-pull the entire image.
Requests stuck in IN_QUEUE status
"Failed to return job results" and 400 bad request with known good code
How to schedule active workers?
CUDA env error
Failed to return job results
Clone endpoint failing in UI
Is there any limit on how many environment variables can be added per container?
how to host 20gb models + fastapi code on serverless
Need help putting 23 GB .pt file in serverless enviornment
ControlNet does not seem to work on Serverless API
image deprecated?
Lora modules with basic vLLM serverless
runpod js-sdk endpoint.run(inputPayload, timeout); timeout not work
Faster Whisper Endpoint Does Not Work With Base64?
Issues in SE region causing a massive amount of jobs to be retried
GPU for 13B language model
"job id does not exist" error on Faster whisper
Mixed Delay Times
Question on Flash Boot
OutOfMemory
timeout in javascript sdk not work
Unstable speed of processing between different wroker.
OSError: [Errno 9] Bad file descriptor on all requests
are there any published information on 'up-time' - or tips on thinking of SLA type?
Clarification on Billed Volume Calculations for Serverless and Network Storage
Plans to support 400B models like llama 3?
How do i retry worker task in runpod serverless?
Speed up cold start on large models
How to get "system log" in serverless
Default Execution Timeout for Faster-Whisper API
runpod serverless start.sh issue
Production emergency
Unable to register a new account using a Google Groups email
Delay Time
Can't setup a1111 on serverless.. Service not ready error
Warming up workers
container create: signal: killed?
Serverless GPU Pricing
Model loadtime affected if PODs are running on the same server
how to expose my own http port and keep the custom HTTP response?
confusing serverless endpoint issue
"Error saving template. Please contact support or try again later." when using vLLM Quick Deploy
Why is it considered that it is always in the queue state in serveless and cannot be executed?
Serverless vLLM doesn't work and gives no error message
Hey all, why does this worker keep alive after the task is completed?
Failed to return job results
dockerless issue
Errors while downloading image from s3 using presigned urls
Run Mixtral 8x22B Instruct on vLLM worker
Output guidance with vLLM Host on RunPod
Serverless broke for me overnight, I can't get inference to run at all.
Please focus on usability
Incredibly long queue for CPU Compute on Toy Example
How to WebSocket to Serverless Pods
Docker build inside serverless
Running fine-tuned faster-whisper model
Understanding serverless & prising. Usecase: a1111 --api w. control on serverless
Problem with serverless endpoints in Sweden
Serverless Error Kept Pod Active
Is it possible to SSH into a serverless endpoint?
How to authenticate with Google Cloud in a Docker container running Serverless?
When update handler.py using "runpodctl project deploy", Old worker does not auto update handler
Webhook's in runpod
reduce serverless execution time
Everything is crashing and burning today [SOLVED] + DEV image with beta 1.0.0preview feedback
Not all workers being utilized
runpodctl command to display serverless endpoint id
How to stream via OPENAI BASE URL?
Flashboot mode: Need help or documentation
raise error
Serverless SD concurrent requests on multiple checkpoints
Cost me 25$ for a small request.
Website glitching when trying to create pod - on Chrome and Brave
Which version of vLLM is installed on Serverless?
When using vLLM on OpenAI endpoint, what is the point of runsync/run?
What is the CUDA version of the A6000 48GB endpoint?
Efficient way to load the model
Can we run aphrodite-engine on Serverless?
Idle timeout not working
Is serverless cost per worker or per GPU?
openai compatible endpoint for custom serverless docker image
Securely using serverless endpoints on the client-side?
I wanna use comfyUI for a img2vid workflow - can I do this via the serverless service
Using network volume with serverless
How to convert a template to serverless?
How do I handle both streaming and non-streaming request in a serverless pod?
Runpod doesn't work with GCP artifact registyr
Memory usage on serverless too high
Does RunPod serverless handler support FastAPI?
What is the meaning behind template on serverless?
Poor upload speed
Costs for increasing worker count
Expose S3 boto client retry config for endpoints (Dreambooth, etc)
Costs have skyrocketed
Logging stoppas at a random point
Hosted Serverless
A way to know if worker is persistent ("active") or not
Comparing Costs: Single vs. Dual GPU Configuration in Serverless Computing
Can't set up the serverless vLLM for the model.
error creating container: create or lookup container: container create: exit st
API not properly propping up?
Can local development use Runpod Secrets?
How does the vLLM serverless worker to support OpenAI API contract?
serveless webhook sometimes does not receive requests
No active workers after deploying New Release
Slow connection speed
stuck in "stale worker", new release with new image tag not deploying "latest worker"
use auto1111 with my own sdxl-lightning models
Can anyone help me setting up serverless endpoint?
connect dockerhub with runpod
Why is my endpoint running ? I don't have any questions and the time idle is set to 1 sec
idle time duration
can't deploy new workers when I haven't reached limit
faster whisper serverless took too much time for an small audio of 10s
Running serverless endpoint locally
'Connection reset by peer' after job finishes.
Connection reset by peer
How to use Loras in SDXL serverless?
Tutorial about Serverless
Runpod return {'error': 'request does not exist'}
Modify a Serverless Template
Downloads from output
AWS S3
Convert from cog to worker
Faster Whisper Latency is High
Limited choice for network volume region
Problems with Network storage in CA?
Questions on large LLM hosting
Help with instant ID
serverless container disk storage size vs network volume
Serverless Endpoint failing occasionally
Serverless can take several minutes to initualise...?
Maximum size of single output for streaming handlers
401 Unauthorized
Serverless suddenly stopped working
Balance Disappeared
Having problems working with the `Llama-2-7b-chat-hf`
Question about billing
2 active workers on serverless endpoint keep rebooting
Billing increases last two days heavily from delay time in RTX 4000 Ada
Bug prevents changing a Serverless Pod to a GPU Pod
Error: CUDA error: CUDA-capable device(s) is/are busy or unavailable
Auto-scaling issues with A1111
How to make Supir in Serverless?
Can we use serverless faster Whisper for local audio?
is there any method to deploy bert architecture models serverlessly?
NGC containers
Do endpoints support custom images?
Webhook failed with 413 or 502 code
copying a param with shape torch.Size([2048, 1280]) from checkpoint, the shape in current model is t
Urgent Query
Need Guidance about LLM Serverless Worker
Custom image stuck on Initializing with systems logs in loop
Endpoint Deployment Stuck on Initializing
server prematurely times out
Enabling and Viewing Logs for Serverless Jobs in Runpod
Subject: CUFFT_INTERNAL_ERROR on Specific GPU Models While Running WhisperX Model
Not receiving any webhooks..
Dreambooth training api
image uploads + google cloud storage
Serverless worker loading with stable diffusion pipeline
Understanding Serverless Pricing
how to prevent restart serverless instance
Failed to return job results.
Why all GPU Unavailable when "runpodctl project dev", a40 is available in runpod deploy page?
How to Run Text Generation Inference on Serverless?
How to download image from s3?
Is execution timeout per request or per worker execution?
S3 ENV does not work as described in the Runpod Documention
GPU type prioritization seems to have stopped working on 13th of March
How to run OLLAMA on Runpod Serverless?
Serverless: module 'gradio.deprecation' has no attribute 'GradioDeprecationWarning
Img2txt code works locally but not after deploying
Docker image using headless OpenGL (EGL, surfaceless plaform) OK locally, fails to CPU in Runpod
Moving to production on Runpod: Need to check information on serverless costs
Serverless prod cannot import name "ControlNetModel"
would not execute a for loop to yield for whatever reason when streaming
S3 download is quite slow
No module "runpod" found
Captured handler exception
How to load model into memory before the first run of a pod?
Increase number workers
High execution time, high amount of failed jobs
How do I write handler for /run
How do indicated job status in a handler?
A6000 serverless worker is failing for an unknown reason.
Can multiple models be queried using the vllm serverless worker?
Didn't get response via email, trying my luck here
Number of requests per second
I shouldnt be getting charged for this error.
Inconsistent delay time with generator worker
ComfyUI Connection refused error
Delay Time is too long
is stream a POST endpoint or GET endpoint (locally)?
Unstable Internet Connection in the Workers
Streaming is not quite working
Knowing Which Machine The Endpoint Used
how to know about serverless api region information
How do i restart worker automatically or using some script?
Inconsistent performance of local runpods and production runpods
base_image in dockerless serverless
Serverless After Docker Image Pull, Error (failed to register layer: Container ID 197609 cannot...)
Failed Serverless Jobs drain Complete Balance
Serverless multi gpu
How can i make a follow up question to the endpoint
Illegal Construction
Serverless cost
What is the difference between setting execution timeout on an endpoint and setting in the request?
Serverless custom routes
What is N95 in serverless metrics?
venv isolation in network volume
serverless multi-gpu
Serverless API Question
serverless endpoint, Jobs always 1 in queued, even 3 workers running
Serverless Inference
Serverless can't connect to s3
how to signup for dev.runpod.io?
Worker configuration for serverless
connection closed by remote host
When using runpodctl project dev to upload a project, is there a speed limit?
Request Stuck in Queue
Why serverless endpoint download sdxl1.0 from hugging Face hub so slow?
I am getting no response from serverless
secure connections
server less capability check
GPU memory usage is at 99% when starting the task.
Should i wait for the worker to pull my image
Possible memory leak on Serverless
Dockerless dev and deploy, async handler need to use async ?
Something broken at 1am UTC
Should I use Data Centers or Network Volume when confige serverless endpoint ?
Are stream endpoints not working?
Postman returns either 401 Unauthorized, or when the request can be sent it returns as Failed, error
Text-generation-inference on serverless endpoints
Cold Start Time is too long
What happened to the webhook graph?
How i can use more than 30 workers?
What is the caching mechanism of RUNPOD docker image?
Hi, is there currently an outage to Serverless API?
serverless deployment
How to know when request is failed
IN-QUEUE Indefinitely
Costing for Serverless pods without GPU
Migrating from Banana.dev
how to deploy suno bark tts model using runpod serverless endpoints
Active worker doesn't get enabled
Massive spike in executionTime causing my jobs to fail (AGAIN)
Failed to get job. | Error Type: ClientConnectorError
Serverless endpoint endlessly on "IN QUEUE" state
Connection aborted for Faster-Whisper endpoint when using "large-v2" model (Pyhton & NodeJS)
error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/"
Can I use websocket in serverless?
Dockerless CLI can not sync local files to runpod server
Huge P98 execution time in EU-RO region endpoint
Docker build can't finish
Broken serverless worker - wqk2lrr3e9cekc
Worker is very frequently killed and replaced
What is the recommended System Req for Building Worker Base Image
Is there documentation on how to architect runpod serverless?
Docker image cache
What port do requests get sent on?
Serverless calculating capacity & ideal request count vs. queue delay values
Runpod worker automatic1111 just respond COMPLETED and not return anything
Serverless GPU low capacity
Runpod queue not processing
cudaGetDeviceCount() Error
VLLM Error
Getting docker error
worker-vllm build fails
Serverless not returning error
Getting 404 error when making request to serverless endpoint
out of memory error
Out of memory errors on 48gb gpu which didn't happen before
Is it possible to run fully on sync?
How to keep worker memory after completing request?
Failed to get job. | Error Type: ClientConnectorError
Help: Serverless Mixtral OutOfMemory Error
Can we add minimum GPU configs required for running the popular models like Mistral, Mixtral?
Severless 404
Unacceptably high failed jobs suddenly
Two Network Volumes
container start command troubleshooting
Active worker keeps downloading images and Im being charged for it
Webhook problem
optimize ComfyUI on serverless
Probleme when writing a multi processing handler
Idle time: High Idle time on server but not getting tasks from queue
Is there a programatic way to activate servers on high demand / peak hours load?
Increasing costs?
[URGENT] EU-RO region endpoint currently only processing one request at a time
Unable to Add Container Registry Auth due to Next.js Crashes
Returning error, but request has status "Completed"
Can I emulate hitting serverless endpoints locally?
All 27 workers throttled
I'm using SDXL serverless endpoint and sometimes I get an error.
API Wrapper
How do I create a template that includes my storage drive?
Deploy from docker hub stuck
Serverless on Active State behaviour
LLM inference on serverless solution
Serverless Pricing
Broken serverless worker - can't find GPU
How does multiple priorities for GPUs assign to me workers?
Runpod api npm doesn't work
How do I expose my api key and use CORS instead?
Worker Errors Out When Sending Simultaneous Requests
Quick Deploy Serverless Endpoints with ControlNet?
Mixtral Possible?
Estimated time comparison - Comfy UI
Any plans to add other inference engine?
Are there any options to retrieve container logs via API?
Serverless scaling
"Failed to return job results. | 400, message='Bad Request', url=URL('https://api.runpod.ai/v2/gg3lo
Stable Diffusion API Execution Time
Serverless Unable to SSH / Use Jupyter Notebook Anymore
Editing Serverless Template ENV Variable
Worker's log is not updating in real time. It only pulls the log every 5 mins..
llama.cpp serverless endpoint
I think my worker is bugged
comfyui + runpod serverless
ECC errors on serverless workers using L4
Does Runpod Autoupdate Images now for non-matching hashes?
VllM Memory Error / Runpod Error?
How do I correctly stream results using runpod-python?
Status endpoint only returns "COMPLETED" but no answer to the question
24GB PRO availability in RO
Deepseek coder on serverless
How to write a file to persistent storage on Serverless?
Run LLM Model on Runpod Serverless
Safetensor safeopen OS Error device not found
L40 and 6000 Ada serverless worker not spawning
Directing requests from the same user to the same worker
Serverless webhook for executionTimeout
Is there any way to do dynamic batching?
Started getting a lot of these "Failed to return job results" errors. Outage?
Custom serverless deployment
Automatic A111 WebUI Serverless on Network Volume
SD Img2Img API does not work with Mask
unsupported model error
Logs are missing.
error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/"
Is there a way to access worker ID & job ID from a handler? Would be good for logging + debugging
Serverless errors in the logs
Issue in pod
ashleykleynhans/runpod-worker-a1111 img2img not working with a mask?
max workers set to 2 but endpoint page shows ‘5 idle’
[FEATURE REQUEST] Granular selection for Serverless Pod GPUs
Serverless - 404 cannot return results
Debugging Failed Serverless Requests
automatic serverless api slow response problem
webhooks custom updates
Error generating images
in serverless GPU Is Delay Time also Charged or not??
sdxl
Unit for Pricing
error downloading model? TheBloke/Mixtral-8x7B-MoE-RP-Story-AWQ
About Queueing
Network Storage Cache
About volumes and images
Api to Text Generation Web UI
network volume venv serverless
Container start command behavior
Docker image and SD Models
Uploading file to serverless
GraphQL: How to get the runtime of a serverless pod through the api stateless?
2x A100 / 3x 48 GB on Serverless
SGLang worker (similar to worker-vllm)
I need to speak about my credits in my account. Thanks
Insanely Fast Whisper
Trying to deploy Llava-Mistral using a simple Docker image, receive both success & error msgs
Worker hangs for really long time, performance is not close to what it should be
$0 balance in my account
vllm + Ray issue: Stuck on "Started a local Ray instance."
Similar speed of workers on different GPUs
Docker daemon is not started by default?
VLLM Worker Error that doesn't time out.
quick python vLLM endpoint example please?
Best way to deploy a new LLM serverless, where I don't want to build large docker images
Pause on the yield in async handler
worker-vllm cannot download private model
How do I select a custom template without creating a new Endpoint?
Slow initialization, even with flashboot, counted as execution time
worker vllm 'build docker image with model inside' fails
Getting TypeError: Failed to fetch when uploading video
SSLCertVerificationError from custom api
Does async generator allow a worker to take off multiple jobs? Concurrency Modifier?
Does Runpod provide startup free computes grant?
Custom Checkpoint Model like DreamShaper
How to force Runpod to pull latest docker image?
Endpoint creation can't have envs variables
How to get around the 10/20 MB payload limit?
/runsync/ getting this error - {"Error":"bad request: body: exceeded max body size of 10MiB"}
webhook gets called twice
Add lora inside a docker image with A1111
question about the data structure of a serverless endpoint
Cold start time
all 5 workers throttled
Tips on avoiding hitting this error whilst checking `/status/:job_id` using requests?
Newbie question
Proper way to listen stream
Can we use other SD models (and Loras) on Quick Deploy serverless?
Is it possible to release a new version via command line?
Increase Worker Max Limit
Empty Tokens Using Mixtral AWQ
Intermittent Slow Performance Issue with GPU Workers
Why is the GPU not full?
All my serverless instances are "initializing" forever
is there anyway to restart the worker when SSH into the device
OSError: [Errno 122] Disk quota exceeded
Does the serverless SD API's have NSFW filter turned on?
Failed to queue job
ComfyUI ValueError: not allowed to raise maximum limit
Webhook duplicate requests
Request Format Runpod VLLM Worker
image returns as base64
Request stuck in "IN_QUEUE" status
Rundpod VLLM Cuda out of Memory
Automate the generation of the ECR token in Serverless endpoint?
Worker handling multiple requests concurrently
Issue with a worker hanging at start
Serverless inference API
Do you get charged whilst your request is waiting on throttled workers?
Is there a way to send an request to cancel a job if it takes too long?
#How to upload a file using a upload api in gpu serverless?
All of the workers throttled even if it shows medium availability?
Unreasonably high start times on serverless workers
Using Same GPU for multiple requests?
Creating serverless templates via GraphQL
streaming
Issue with Worker Initiation Error Leading to Persistent "IN_PROGRESS" Job Status
Log retention and privacy
Serverless doesn't work properly when docker image is committed
[Errno 122] Disk quota exceeded
Error whilst using Official A1111 Runpod Worker - CUDA error: an illegal instruction was encountered
Use private image from Google Cloud Artifact Registry
Outpainting
Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!
SCP
Performance Difference between machine u3q0zswsna6v88 and cizgr1kbbfrp04
Warming up [Billing]
Worker not consuming jobs
RuntimeError: The NVIDIA driver on your system is too old (found version 11080). Please update your
Worker log says remove container, remove network?
Hi all. I created a pod, started it, but can't ssh, can't start its "web terminal", can't do anythin
Should I be getting billed during initialization?
[RUNPOD] Minimize Worker Load Time (Serverless)
Runpod VLLM Context Window
Real time transcription using Serverless
ailed to load library libonnxruntime_providers_cuda.so
Setting up MODEL_BASE_PATH when building worker-vllm image
What does the delay time and execution mean in the request page?
Extremely slow Delay Time
Custom template: update environment variables?
Delay on startup: How long for low usage?
Why not push results to my webhook??
Restarting without error message
Set timeout on each job
issues using serverless with webhook to AWS API Gateway
Monitor Logs from command line
What does "throttled" mean?
Error building worker-vllm docker image for mixtral 8x7b
qt.qpa.plugin error with sd-scripts/sdxl_gen_img.py
Accept new task when continues to process the old one
I want to use A100 with savings plans!
Custom Template Taking Hours To Initialize
How to retire a worker and retry its job?
Best practices
Problem with venv
Experiencing huge execution time on Serverless
Mount gpu in container
"Initializing" State Duration
Issue with Dependencies Not Being Found in Serverless Endpoint
Running script / ADetailer
progress updates implementation for Automatic1111 / ComfyUI
How much RAM do we have per Serverless endpoint?
Import PIL (pillow image library) in rp_handler.py
Possible error in docs: Status of a job with python code
Image is generated successfully, but cant not found for sending back
Serverless Endpoint Streaming
How to reduce cold start & execution time?
How to edit/view handler from a cog on replicate?
Général advices on the pricing and the use of server less
Custom Handler Error Logging
Runpod Custom API request and rp_handler.py
Slow model loading
Network Volume and GPU availability.
Number of workers limit
How do I estimate completion time (ETA) of a job request?
Does RunPod support setting priority for each job request?
serverless webhook support secret?
Queued serverless workers not running and getting charged for it?
Is dynamically setting a minimum worker viable?
Issue with unresponsive workers
Execution time much longer than delay time + actual time
Advice on Creating Custom RunPod Template
Vllm problem, cuda out of memory, ( im using 2 gpus, worker-vllm runpod's image )
Hello, i think my template downloaded the docker template image while running my request
accelerate launch best --num_cpu_threads_per_process value ?
Issue with Request Count Scale Type
Do I need to keep Pod open after using it to setup serverless APIs for stable diffusion?
how do you access the endpoint of a deployed llm on runpod webui and access it through Python?
Best Mixtral/LLaMA2 LLM for code-writing, inference, 24 to 48 GB?
Is runpod UI accurate when saying all workers are throttled?
serverless: any way to figure out what gpu type a job ran on?
Is it possible to build an API for an automatic1111 extension to be used through Runpod serverless?
hosting mistral model in production
Jobs suddenly queuing up: only 1 worker active, 9 jobs queued
Issues with building the new `worker-vllm` Docker Image
ImportError: version conflict: '/opt/micromamba/envs/comfyui/lib/python3.10/site-packages/psutil/_ps
Jupyter runpod proxy extremely slow
A100 Savings Plan
Runpod Running Slower Than Local Machine
How to transfer outputs when GPU is not available?
Can I spin up a pod pre-loaded with my /workspace?
New to RunPod but problems
Cannot turn on pod to backup data
error with github workflow
Create new template from runpod sdxl github release 1.2.0
Cuda too old
How to build worker-vllm Docker Image without a model inside?
when will the status endpoint for a serverless function return 429s?
Issue with worker-vllm and multiple workers
Throttled
4minute Serverless (Server Not Ready) Constantly.
Download Stuck - Worker VLLM
Cost calculation for serverless
Can worker-vllm work with https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ
Unable to access network volume data from serverless deployment
Stop button missing
will i be able to use more than 1 gpu per worker in serverless?
Not able to run Jupyter Lab?
Running Spaces on Runpod Error
Fooocus run_anime.bat
Any easy pipeline to migrate from GCP Cloud Compute VM Instance to Runpod Cluster?
Do A6000 pods have NVLink support?
how do i do this?
Bug in Runpod ComfyUI Network Volume Setup
web ui was demanding i pay just to start a pod, but i have plenty of credits
GPU Suddenly Stopped Working
custom docker which uses streamlit and postgresql
Unable to start kernel
Assistance with SSH Access to My Pod
Global Networking
[GPU not assigned to Pod – Need help]
Network Issue
Unable to back up volume data to Google Cloud storage bucket
Can't send using RunPodctrl and can't resend
Not Work
runpod-torch-v280 & RTX 4090 unsatisfied condition: cuda>=12.8
What will happen if i resume pod on fully occupied physical machine via api
upgrading storage makes pod inaccessible
app keeps disconnecting ?
Network volume access
runpod nginx networking
Pods deleted shown under admin account
The container doesn't start
How to use my custom docker image from my dockerhub account?
Pricing models
A100 SXM with 87% GPU Memory Used at boot
error creating container
When trying to launch a new CPU pod, the button to choose a template is gone
SGLang DeepSeek-V3-0324
scp and ssh over exposed tcp doesn't work
Problem payment by mastercard
Can't get stuff in /workspace to persist
GraphQL API Issue: Launching Pod Fails (GRAPHQL_VALIDATION_FAILED)
Post Startup Download Scripts
my pod deleted
Creation of new pods in EU-CZ-1 results in "ssh: connection refused"
Pod networking issues?
ROCm 6.3
ComfyUI never opens on port 3000
Unexpected Pod Billing After Failed Deployment
Pods are terribly slow
Slow Image previews regardless of pod
Issues with SSH in Axolotl Pod
CUDA device uncorrectable ECC error
Volume full and deleting files doesnt free up space
Created a pod, and it is not appearing in my list of Pods so I can't view it to turn it off
Problem with hanging pod
h100 servers having issues?
vLLM Inconsistently Hangs at NCCL Initialization
Issues when restarting stopped pod
Need Pytorch 2.5 and 2.6 offical Docker Image
500 Response when creating pod using API
Missunderstanding what I need to do to rent a GPU with ssd...
Intermittent Pod Issues: CUDA Errors and Pod Unresponsiveness
Pod ran out of CPU RAM
What's the right procedure for creating custom template images
Struggling with runpod unable to access htpp server or terminal error.
cliploader error
Uncorrectable ECC error encountered
runpodctl project example doesn't work
RunPod Deploy Streamlit App
POD stops working after a day or 2:
error creating container: cant create container; volume must exist
SSH over exposed TCP connection refused
NGINX, Uvicorn and FastAPI setup not working
DNS resolution
Axolotl Fine Tune Error (flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol)
Jupyter bug with checkpoint folder in comfyui
Is there an api to sync with Backblaze B2?
How to build on top of runpod dockerfile?
Training AI with a RunPod GPU
RTX 4090 Instances Not Starting Up
hardware graphics acceleration
GraphQL secretCreate Mutation
EUR-IS-1 severely low latency
Global Networking
Need Help with Auto GPU Shutdown & Startup
Model upload to huggingface is so slow it costs more than training
ports section not popping up?
template
I can't use any of the pods at all. Every one I deploy ends up having a Not Ready Status for HTTP
SwarmUI with Network Storage. Missing modules on new pod connecting to it?
Just found a security issue on runpod
Is it possible to get pod logs from REST Api or GraphQL?
Why am I unable to connect to a http server?
How do I install a model with kobold ai?
L40 Thermal throttling
New to runpod. Have never even coded in my life
question about price of gpu pods
Runpod occasionally fails to pull from ECR
Dead instance on launch
GPU seems to have stopped...logs don't show any errors, but there is no activity
Migrate pod volume to Network volume
Unable to modify owner of network volume
Can't run extensions in stable diffusion
Cuda not connecting to image provisioned for GPU
Requests using RUNPOD_API_KEY fail with 403 unauthorized.
run commands remotely on my pod
Flux Gym
Http bad gateway error
LLM training process killed/SSH terminal disconnected, seemingly at random, no CUDA/OOM error in log
2 GPU but only one work
deploy fail, can't get template, networking, could not resolve host github.com
I can not do training out of memory error I got)
How to self terminate pod on crash
API endpoint
Having to re-download all models
Error while deserializing header: HeaderTooSmall
Trouble training sdxl lora with kohya
vLLM and multiple GPUs
You must remove this network volume from all pods before deleting it.
CA-MTL-1 | Network Volume Input/output error
CA | Latency when loading files from Storage
Random 404 errors thrown by runpod secure cloud instance
A100 in CA is just a bad node
automatically start jupyter notebook with API call?
nvidia-deepstream container template hangs when starting
CUDA 12.6 Image Having Issues with A40 - Is it a CUDA version issue?
I lose my data every time I stop my pod
Cant set volumeInGb to 0 from api while creating pod
How do I transfer my pod’s data to my network volume storage ?
Stable Diffusion ComfyUI for Krita
textual inversion embeddings not showing in automatic
How do i change my instruct template using text generation webui v2
Unexpected Storage Full Issue on Network Volume
What is the best way to transfer my local folder to pod?
Automatic templates not installing models
Where are community templates?
Bad Gateway
How do I ssh tunnel into my runpod instance?
Is there any way to apt install screen to resume on the webterminal if the connection closes?
failed to pull image: failed to register layer: Container ID 1258672 cannot be mapped to a host ID
VERY LOW SPEED FOR DOWNLOAD
Websocket Runpod proxy problems
🛠 /etc/hosts Not Resolving DNS
Connect pods to GKE cluster
File Transfer
how to connect to Network Volume after ssh-ing into a Pod?
Connect to jupyter lab?
How to have a pod terminate after running a script?
How to use python api to filter the machine not available to start with network volume?
Abnormal GPU Temperature in community cloud
Dropbox cloud sync is very very slow
How to talk to OLLAMA on the same pod? Network error communicating with Ollama: Request URL is missi
Issue HTTP service not ready
Need Pods ASAP
Newbie creating a pod : pb with network storage and stop a pod
automatic not showing models
Output in ComfyUI
storage
Issue with Checkpoint Switching and Runtime Error
model cache status?
Http reverse proxy disconnects
ComfyUI keeps reconnecting on Pod (EU-SE-1) with network storage
How to determine server location in respect to AWS
having trouble connecting to my RunPod via SSH
Pod downloading template everytime i start a new one
No http start
Network and Local Storage Performance
Run own CPU
2-3 hours for Pod to load OneTrainer and Flux model
Pod SSH Connection Slow and Failing in EU-SE-1
4090 Power capped
Network volume for GPU and CPU
Unable to run any docker image in runpod instance
HOw can I open port for my pod and run it as a api endpoint
[runpodctl] Error creating a pod with a network volume attached
Mimicking UI with API issues
Pod Not Starting
How to copy current settings into a template?
Pod with multiple gpus (rtx 4090)
CUDA error in community 4090x4 pod
How to connect using WinSCP to community pod?
IS datacenters slow connectivity
Experiencing some POD connection issues in CZ
Can't access Checkpoints folder via JupyterLabs
Pod stopped and I cannot re access
Pod easily get OOM!
Spot instances dissappearing??
I've installed docker on an ubuntu pod, but cannot start it with systemctl
For network storage how can I use big files in specific comfyUI directories?
I am trying to run ComfyUI on my second pod but I am getting error: RuntimeError: CUDA unknown error
US-OR-1 Net Volume Export
Training runs 2-5x slower on pods than on home system.
Completely lost connection to volume at CA-MTL from different account at now
how do I fork a community template
Authenticate to AWS ECR private repository
Run commands on restart
Wandb giving 403 error
Can't stop my pod! Only terminate
I am trying to send my LoRA to runpod but I keep getting 'room not ready' on the web terminal
jupiter
using iptables with pods whilst maintaining jupyter access
Limit Memory Usage
H100 pod not connecting to network drive of the same region
something wrong with pytorch2.4.0 image's jupyter
4 x A40 never ready in CA
Unable to connect to pod after launch H100s
Pod image for network storage management
storage full error, disk write error
Ask the service rate limite and etc.
Why is there still a daily charge after purchasing pod A40-48G with a one-time payment?
error on gpu causing damages
Network problem
Official template vllm-latest is broken
Downloading models causes the pod to freeze
Getting 403 forbidden error on multiple pods
Hi does anyone know why some templates do not load with the option to have storage volume such as th
Share data between two runpods using network volume
runpodcli create pod error
Actual internet speed much lower than listed internet speed
Cannot restart pod
ollama: when i try to install ollama with the command
Can I get the account balance from api?
github pull time
Expand pod size
Currently my network volume is pinned to EU. Is it possible to move it to a different region?
How to run https://hub.docker.com/r/dockurr/windows
Limit Memory Usage
I can't connect to 40168 port
How to Run a RunPod GPU Pod Behind a Reverse Proxy Without Exposing the URL?
Hello, I am pulling out all my hair with this 3000 HTTP service
Container restart policy
Docker image infinitely restarts
Is there a way to access network volume storage without opening a pod? No available pods on EU-CZ-1
when will cpu network pods be fixed??
cuda upgrade
File migration, SSH between Pods
Pod stuck forever on "This server has recently suffered a network outage "
VLLM problem in A100 instance
EU-RO-1 consistently slow network performance compared to other data centers
deepseek-r is loading for >1h into vram.
Docker image from Docker hub
Only 1 CPU Core getting utilized
Billing issue I found
network volume
Kobold.cpp - Remote tunnel loads before the model, causing confusion (possible off-product issue)
VRAM stuck at 77% usage
Restore_snapshot error.
Choose CPU model on Pods
Maintanence
Network Storage question
no more full ssh? cannot connect vs code / cursor
runpodctl communityCloud + spot
Multiple Pods with same network storage, ports?
HTTP 502 on VLLM pod
Potential L40S P2P Communication Issue via NCCL on Some Hosts in US-TX-4
Pod data loss after disk resize
Can't connect to terminal or jupyterlab on runpod pytorch 2.1 or 2.4 template
NVIDIA Driver Selection
How to extend pod with saving plan
securing channel...room not ready
Pod unusable, extremely slow
jupyter not opens when I activate the pod
Pytorch 2.4.0 ROCm 6.1 pod broken
Web terminal not working
Add GPUs to an existing pod?
Looking for a GPU Cloud Service Platform that support bare metal RTXA6000.
Huggingface changed to unauthorized?
Cannot Stop pod from RunPod Web UI
Provisioning script for Runpod ComfyUI template issue
Cannot find any model weights with `/models/huggingface-cache/hub/models...`
nvidia docker
No device found for buffer type CPU for async uploads
URGENT: Multiple H100 instances critical error - ICML deadline tomorrow
Custom external port number
Diffusers conflict in ComfyUI
dstack apply: Resource name should match regex
volumeEncrypted Broken in API
RTX 5090 Pod Availability
Team access to authorized ssh keys
comfyui
Pod overwrites my project code in "Volume Mount Path"
UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu.
Network volumes temporarily disabled for CPU Pods - timeline query
Pod memory limits
AI Toolkit Lora Training torch.OutOfMemoryError
Where is My Network Volume Mounted?
Lost Workspace running (official) Stable Diffusion Template
omp.h unable to access all processors.
How to queue requests to vLLM pods?
Pod stuck trying to start custom docker image
My pod is taking forever to download the image
Pods stuck on “Waiting for logs”
I am having issues with running jupyter lab on my pod, it was running before but just got disconnect
Container Registry Auth not working for private docker images
Model Maximum Context Length Error
Runpod storage configuration
Fine Tuned Whisper V3 Large Turbo Configuration
AWS ECR Image run on Pod RTX 2000 Ada
Deploy custom private docker image
Reset a Pod with CLI or API
Persistent files in Network Storage
Link broken in Runpod.io Github tutorial
pods just keep stopping without any reason why when downloading?
Pod execution stopping without errors
H100 + 20 TB of storage
How to set Saving Plan on existing pod?
Cant upload .env
Enable Global Networking through API
Pod / GPU stopped working
Slow performance on EU-SE-1 pods with network storage
ComfyUI and Jupyter not starting up
Pod not connecting server issue - ComfyUI AI Dock
Mi300x NCCL Issue
S3FS / Creating a pod with special docker parameters
Stripe is declining my payments
New UI tail logs no longer working
Security issue: Attackers Scanning Runpod pods?
Billing per User/Audit Logs
How to Terminate a Pod via GraphQL?
How to setup?
Billing not clear
Billing unclear to me
Terrible Network Speed at CA-MTL-1
The pod is not exposed externally.
Do any official templates include sftp setup?
Networking Issue with CA-MTL-1
Network volume and transfer to new pod
treat blocked Trojan:Script/Wacatac.B!ml found wile downloading my envoirment and dataset folders
How to handle these WebSocket connections?
How to set ContainerRegistryAuth for `podRentInterruptable`
Fetch available spot pricing for gpu types
Hardware optimization for computer vision
ssh not connecting
FaceFusion is not working.
Pod disapeared completely
Followed the Hunyuan Video tutorial to the letter, and it didn't work.
Lost my GPU and forced to pay more?
Create pod in a specific data-center in Europe with python?
Runpod VS-Code Template DNS Resolution Fails
Why do some pods have a stop button and others only have a terminate button?
Spot price seems to be broken with SkyPilot
Port 3000 not working
Issue with Huggingface dataset not being cached to storage volume
Storage on US-KS-1
File management in inaccessible folder
Is there a limit in the number of threads?
Error creating temporary lease
video2x
GPU Memory Already Occupied
Pod not working
I am getting charged already just for starting the pod
Pod keeps on restarting when using docker_args
CA data center connectivity issue
PyTorch Pods never initializing - stuck waiting for logs
How to keep training running after disconnecting through VSCode?
0 gpus on pod?
Unable to connect to ssh with tcp open, due to it asking for a password. Jupyter not functioning
Start delay > 10 minutes
Extremely slow network storage
Unable to run jupyter on custom docker image
Open UDP Port
"Something went wrong!" when trying to sync with Google Cloud Storage
Throttled download speed from container registry while still being billed
Repeated RunPod Stripe payment problems
Unable to Decrease the network volume size
No datacenter for H200 with Volume?
The problem in connecting SwarmUI with RunPod
Multi GPU problem
Unable to create pod with GraphQL
Whats happening to runpod rn
Creating a Pod with dockerArgs and a docker image from a registry that requires auth
Indicate region in deployment console menu
runpodctl create pod --communityCloud --gpuType 'A4500' --cost 0.19 is not working
i have an integration with runpod in python. after i deploy a pod, i want to show the user the ip
50/50 success with running a standart vllm template
Container keeps restarting
Extremely slow upload to HuggingFace
Enable performance counter on runpod
The "Fine tune an LLM with Axolotl on RunPod" tutorial should mention uploading public key first
Error response from daemon: container
Templates view is broken
Suspicious space consumption or volume disk not mounted
runpodctl get pod -a does not return the pods IP
Pod separation
maintainance time
How often do pods get network speed tested?
Error: Unauthorized
Configure pod to auto stop/auto delete once the container's main process exits?
unable to ssh to serverless pod
Network Volumes on CPU pods
Import SD3.5 from HF to Runpod
POD connectivity/bandwidth extremely slow
I'm getting a 502 bad gateway cloudfare error to my pod's http endpoint. Will anyone fix this issue?
runpod is running very slow
Pod not "opening"
Cannot connect A100 PCIE on secure pod with vscode
The "connect to http service" button disappeared since yesterday. How do I connect to my pod now?
SSH over exposed not working
Faulty node?
Specify runpodctl location
GPU memory already in use when pod starts
Pods not even starting due to low memory
Do Runpod Pods run in privileged mode?
RunPod disconnecting/resetting during model training
ERR_NVGPUCTRPERM when profiling CUDA kernels
SSH missing password
Real Performance Comparison: H100 vs RX 6000 Ada
Auto-exit on finish?
I can't deploy a pod, it won't recognize my keys!
Critical error suggests copying data from pod, but can't log onto pod
Someone is using my CUDA Memory?
Zero Gpu Available Issue
Problem With 0 GPU's
Unable to boot mi300x
Logging in / Starting Pods problems
Mi300x HIP error: no ROCm-capable device is detected
Whats happening to your pods rn?
Need Help Deploying Stable Diffusion on RunPod
Best way to run offline
CPUs not available
Can't connect via ssh: Runpod asking for password
Securing a POD with an API key
Pod eternal image fetching
mi300x are unavailable
error pulling image: Error response from daemon
A100 GPU vram being used
My pod has suddenly disappeared but I'm still being charged for it
Two pods disappeared from the console
Why do you limit upload speed?
Does the domain *.proxy.runpod.net always resolve to the same IP address (static IPs) for DNS query?
I can't connect to remote desktop.I have installed Ubuntu Desktop and when accessing the IP and por
Automatic stop Pod after some time while using ollama
Can't view my ComfyUI workflow even though i exposed ports
Trouble comparing pods
Starting a pod with runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04 has cuda version 12.6
No GPU available, I want to move it to network storage so I can rebuild on another machine?
Broken pod
No GPU available
I am not using GPU, but someone else is occupying my GPU. What is the solution?
"There are no longer any instances available with the requested specifications."
Stuck on Pod Initialization
MI300X in RO cannot be created
Pods getting erased/terminated
Hosting RunPod as an API endpoint
accessing nginx server on my local machine
Does the Kohya_ss template support FLUX?
Network volume permissions
How to migrate serverless endpoint to a pod?
Ollama on Runpod
My pod is down, and won't restart
A100 PCIe is not working with EU-RO-1 storage.
Error when synching with Backblaze
Import backup from volume disk to Network volume
Pod is stuck on network outage message, no changes for quite a while.
authorized_keys not working on runpod
Runpod is not utilizing GPU and Showing zero GPUs
Stuck on "Waiting for logs"
Multiple Pods SSH Resolving to Same Machine
Network errors in Secure Cloud
Pod with Comfy (flux + stable diffusion)
Changed Log output on the Runpod website
How do I find my network volume with runpodctl?
network outage pls fix to it
Cannot see logs on my pods
Storage Pricing
Any network issues in EU-RO-1?
I'm seeing 93% GPU Memory Used even in a freshly restarted pod.
Persistance in pod logs from my training
Custom template
Help Request: ODM Container Only Using CPU
GraphQL Schema
How saving plan work ?
502
Decommissioning on November 7th
Lost my GPU
Where are default models mounted? I can't find them under /comfy-models
Is scp on ssh connection to pods not supported? what could be alternative download files from pod?
Port forwarding understanding
Why isn't comfyui 6.0.0 image cached that the container image is downloaded everytime a pod is ran?
Problems starting my pod with and without GPU.
Is there something wrong in US-OR-1?
Money on new account
Is there a way to launch a pod and then setup cloud sync (from Google Drive) via API/SDK?
ComfyUI: Diagnosing errors like "Syntax error: Unexpected token '>'" by logging to file?
Python SDK resume_pod
Network Volume as Storage for images
Network Volume Integrity
Stable diffusion checkpoint list empty with Better Forge template
Can't select 2x GPU for my old pod, while I could start a new pod with the same GPU setup
Udp ports
Cannot set TCP-Port 3000 for Dreambooth
A40 availability
SGLANG load LLM Model
When will H200's be available?
Unable to start pod using GraphQL
Differentiating between the pod state, "starting" vs "stopping"
Building and deploying dockerfile from Pod
Still waiting for logs but I can Console in?
price
Assistance with Deploying AI App on RunPod
cant ssh to runpod
How fast are network volumes?
Please help me.
very slow network storage
Pod not starting up properly anymore
Putty for SSH? Any clues?
Unable to Connect AWS to RunPod
External IP Ranges (for an AWS VPC Security Group
how can i access network volume from jupyterlab notebook ?
I need to reinstall the pip requirements for comfyui everytime I start a pod.
RTX 6000 Ada pods breaking
how can i shrink my volume size?
Can I use the filtering syntax when calling myself query?
How to connect to SFTP via rclone/Fuse
I lost my mininconda env after I start my runpod
How can you move a network drive to another region?
Create Pod with networkVolume using runpodctl
How can I create a pod with public ip using graphql?
Anyone know any good cmd's for downloading files in Jupyter Notebook?
Is there a way to determine my SSH username for a pod through runpodctl?
Need help for error
Where's the network volume?
I want to get a Public Url
How can I access more logs from a pod?
Running pod Terminal is not starting
Llama
XXXX.safetensors is not a safetensors file
I cannot use my SSH key for authentication process for my pod.
502 Error when attempting HTTP 8188 connection
runpodctl create pod
connection refused for port forwarding (for colab)
Volume with no files is registered as having 23GB
How to Keep Installed Python Modules Persistent and How to Mount Multiple Volumes?
Deploy pod without scheduled downtime
Low GPU usage
Special characters in pod IDs
Why is Runpod so slow?
Pod is down alltough it should be online
Why is pod speed VERY slow with multiple ongoing pods
H100 NVLink
Jupyter Notebook not cooperating after 1st Reboot
Suggest a template for this text classification model (small model from huggingface)
Is it possible to save template overrides on the official templates?
Runpod VLLM - How to use GGUF with VLLM
Creating and managing custom containers
Starting pods without having to re-run template
Is it possible to run a WebRTC server on a pod?
Runpod API documentation
Is runpodctl abandonware?
Any updates on when network volumes will be enabled for CPU pods?
Docker argument issues - python sdk (docker_args)
Modify existing FaceFusion template?
no cuda gpu detected
any way to control the restart policy of pods?
how can i deploy an instance with 4070, 4080 gpu?
CAP_SYS_ADMIN privileges inside container
RunPod SD InvokeAI v3.3.0 Unable to import a model
Unable to restart pod
connection refused SSH over exposed TCP
Support for terminating pods via SkyPilot
rsync does not work
Extremely slow network after deployed EUR-IS-1's storage using A100 SXM
Build a docker compose yml file
Unable to Type into Terminal
Unable to Open or Delete a Folder
Can create a Pod with an A1111 template
We have detected a critical error on this machine which may affect some pods.
Error while deserializing header: MetadataIncompleteBuffer
File transfer on filezilla is very very slow
How to create a community pod using an API, specifying the network quality?
urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Urgent: {'message': 'Something went wrong. Please try again later or contact support.'}
Running a different run command for docker run
pubkey doesnt work
Stylesheet failed to load: https://t3di4ckhixplhu-8888.proxy.runpod.net/lab/api/themes/@jupyte...
connect to secure cloud pod via SSH over exposed TCP
HELP ME!! An attacker DDOS me from 213.173.110.94
Port not available
Segmentation fault (core dumped)
ComfyUI Error: Cannot import 'preserve_channel_dim'
Why does Dropbox Cloud Transfer keep returning "Something went wrong."?
SOS pod gpu errors
How to find the utilization of network storage
Where is the venv located while using ComfyUI
I am trying to add money to my account but all my credit cards are declined.
Multiple instances in savings plan?
Unable to Connect to Machine
issue with websocket (wss) port on runpods
I want to make kohya ss available through the http service [port 7860] button.
Public IP security?
Automatically shut down pod after being idle for X mins
Updating Forge in Runpod
api having issues randomly
0 GPUs notif
Running a Dockerized Pytorch-based computer vision app
Minimize the startup time of ComfyUI on serverless/pod GPU
Can I access Network Volume via VS Code Explorer?
Error 804: forward compatibility was attempted on non supported HW
move network storage between data centers
Forge webUI template is outdated, git pull is not updating it
How to enable lsof command?
How can I add jupyter notebook to my custom template?
any way to do this via api? runpodctl exec {podId} -- {command}
I cannot send more than one env in pod creation using GraphQL
Slow download speed
ComfyUI in aitrepreneur/comfyui:2.3.5 template cannot be updated anymore. Alternatives?
I'm having trouble ihaving docker inside a pod i want to push my image using my pod
Hi, has anyone succeeded in adding a RUNPOD as a node to an existing kubernetes cluster )
NVME Disk for GPU Pod
connection refused
INFO:app_logger:Error generating output Connection to remote host was lost.
ssh password
You must remove this network volume from all pods before deleting it.
Trying to install Ollama: 'could not resolve host'
Network volume is read only
Problems SSH-ing into instances suddenly
Quick Restart Tips for Pod Restarts?
SD checkpoints not showing on ComfyUI
Killing Terminal Process
Runpod Research & Startup Sponsorship
Enable UFW
Stable DIffusion Template WITH ControlNet models preloaded?
Pod hangs for git add command. Tried some memory loading and hangs indefinitely.
Terminal does not work in jupyter notebook.
Increase spending limit
Hi,
Jupyter notebook - does it keep on running?
Open-WebUI 404 Error
Why is upload speed so slow?
GPU errored, machine dead
Slow Container Image download
Can I specify CUDA version for a pod?
Pods wont start
create POD with full Intel Sapphire Rapids CPU chip for Parallel Algorithm scalability test.
My pod had been stuck during initialization
Creating instances with a bunch of open ports
creating instance from an image file
Creating pods with different GPU types.
Slowish Downloads
can't cloud sync with Backblaze B2
How do i deploy a Worker with a Pod?
Funds not appearing in account balance
Very inconsistent performance
Can someone help me fix my tensorflow installation to see the gpu?
save state of pod to persistent storage?
There's inconsistency in performance ( POD )
Pod's connection is less stable that the tower of babel
Two pods disappeared from my account
Why cant I find any comfyui templates on explore!!!!
comfyui api 401 unauthorized
Spot
Should i able to launch a pod using nvidia/cuda docker images?
Connecting to Pod- Web Terminal Not Starting
Am I downloading directly from HuggingFace when I download models?
Not 1:1 port mappings for multinode training
How to override ollama/ollama image to run a model at startup
How to send request to a pod ?
Stoped Pod Price
Looking for best A1111 Stable Diffusion template
No A40s available
community cloud spot POD
Does the pod hardware differ a lot in US?
GPU requires reset
Problems updating admin passwords on kasm image
No A40s available?
kernel dying issue.
Running out of disk space
Interested in multinode training on Runpod
Continuous Deployment for Pods
Production pod suddenly unreachable, how long can I expect this to last for? (Please provide ETA)
Test Support Thread
How to stop a Pod?
Maximum number of A40s that can run at one time
Cannot SSH over exposed TCP (multiple pods, tested from different local machine)
Does RunPod support other repos other than Docker Hub?
Persistent container disk
How to avoid Cloudflare timeouts on pods?
Environment variables in direct SSH
How does runpod handle pod terminating
KoboldCpp - Official Template broken
Secret now showing up in the pod `env` output
transfer data of a stopped pod to a new one
pod error
Pod Down for hrs
Can pods shutdown from inside the pod itself?
Does runpod provides environments isolation?
error pulling image (US community server)
Resuming an on demand pod via sdk
Possibility of Pausing a Pod Created with Network Storage
Docker run in interactive mode
Made a optimized SimplerTuner runpod : Failed to save template: Public templates cannot have Registr
URGENT! Network Connection issues
Looking for suggestion to achieve Faster SDXL Outputs
Official Template not running correct version of CUDA
I can't run the pod with container start command
Volume / Storage issues
I'm trying to start a cpu pod using the graphql endpoint and specifying an image
Please help urgent
Running custom docker images (used in Serverless) to use in Pods
cloud sync fail
Can not start docker container
libcudnn.so.9: cannot open shared object file: No such file or directory
Can't access pod
Multiple containers on a single GPU instance?
Connecting Current Pod to Network Volume
Weird error when deploy lorax inference server
Waiting for logs Unable to initialize, it keeps initializing,Unable to initialize, it keeps initiali
Passwordless SSH doesn’t work half the time.
Flux in Runpod Stable Diffusion WebUI Forge doesn't work in Runpod, although it seems to be possible
vllm seems not use GPU
Updated a1111 and now i cant connect to the webui port
Pod resume failed: This machine does not have the resources to deploy your pod.
Help! My Port 3000 (A1111 web-ui) isn't starting up.
Can't update custom nodes ComfyUI
pod with custom template have no tcp ports exposed
IS disk slow
Community runpod template error (Comfyui ashleykza)
Syncing taking too long?
How to store Model to Network Volume
Account Drained overnight with nothing running
Unable to start pod with llm-foundry image
How to Run Roop unleashed on Runpod
Pod unreachable
Is there a way to transfer disk volume between instances? not through 3rd party cloud.
A1111 Stable Diffusion 1.10.0 - problems with Dynamic Promts
URGENT, NEED HELP!
Critical error
LoRAs aren't showing
ADetailer for Runpod Stable Diffusion isn't working.
How many ports can I expose?
ComfyUI : GPU and VRAM at 0%
CUDA error: uncorrectable ECC error encountered
Save template overrides
Base image + code
Why can I only rent 7 h100 nvls since yesterday and not 8?
Accerelate launch is getting stuck on pods
network storage sooo slow
Malformed database disk image
UDP port and template from private docker registry
Problems SSH'ing multiple times, lost ssh keys?
Venv not found
A1111 Stable Diffusion 1.10.0 Pod filling up disk immediately
Unable to start pod with MI300x
Exposing port not working
Error after restarting the containers.
ULTIMATE Stable Diffusion Kohya ComfyUI InvokeAI
Anyone Getting Bad Pods with Internet Issues?
Creating a pod by extending another pod
Container Logs via the API or SDK
Training jobs using script
Possible to terminate pod from Within the pod?
Install the dependencies issue
How is runpod secret / environment vars for credentials more secure?
Get SSH Login Via API
Llama3 setup
BROKEN: TheLastBen Fast Stable Diffusion
Network volume
network volume
ollama won't pull manifest - weird error.
is disk volume faster than network volume?
Cannot open Checkpoints folder - Fooocus
CA-MTL-1 region has GPU loaded at 87%
3 pods inaccessible after network outtage
Build docker image
How to set environment variable when launching pod with network volume
'Background' options for Pod Initiated file transfer
No such image
network volume usage on pod deploy
Is it possible to use Runpod to finetune a text to speech model
Predict SSH over TCP command predicting <username> - trying to automate pulling a repo at pod deploy
text gen webui template not downloading models
Error response from daemon: driver failed external connectivity on endpoint.
Updated Torch templates
Persistent home directory?
Getting Available Values for Stable Diffusion API Parameters.
Which template to use?
A100 OneTrainer stuck in downloading loop
OneTrainer
Locking down web accessible items
How to fetch more than 8 gpus on RunPod (2 nodes)
Environment Variables are not set in "SSH over exposed TCP" for the root user
Environment variables missing
GPU Memory Used Issue
Avast Antivirus detects Runpod as a Trojan Virus
vulkan
how do I add storage to a existing running pod?
Quick Question for SSH connection
Failed to initialize NVML: Unknown Error
Runpod fast Stable Diffusion does not working
Help with Stable-diffusion-Automatic(1111) webui
Help with Connections from my website
Cannot start fast stable diffusion notebook
Is the pod network down ?
how to stop scripts from replacing my configuration
Trouble with Pod HTTP Service Port
Please help - Connect web to Pod
two pods disappeared .
Can't connect with my POD
Can private images on GHCR be fetched using registry credentials?
How to Estimate the Survival Time of Spot Instances?
run a function in a pod
Impossible to launch a CPU Pods via API
pod network down
Pod crashing due to low regular RAM?
where is the stop icon??
wasted all my credits trying to figure out how to actually initialize the GPUs in the pod instamces
Multi Node training with torchrun/slurm
How to get Public IP and set symmetrical port mapping on Pod via Python SDK
🆘 We've encountered a serious issue with the machines running in our production environment
REST API with Ollama
Can't create pod via graphQL endpoint but works manually
AMD pods don't properly support GPU memory allocation
TheBloke/goliath-120b-GPTQ with RunPod Kobold AI United
Ollama stoped using GPU
Why all folders and files in workspace folder are lost?
Unable to upgrade linux kernel version from 5.4.0 to 5.15.0 - RunPod A40 GPU
Set a Hostname
Jupyterlab not work
Can't connect to pod
why it download bar does not show in some browsers. I have no idea how long will it take to download
Problems building docker image using /workspace as root
Fooocus loads the sdxl model too slowly
Fooocus generates images too slowly
Can't use Fooocus to run on an open port
NansException: A tensor with all NaNs was produced in Unet.
how to change disk size when deploying pods?
Pod ssh configuration docs out of date?
How can I run a pod as an API endpoint? (not serverless)
can't start pod using the cli
Comfyui not using GPU, how to fix this?
Just created a pod and it started with 99% gpu utilization
What are the rules that RunPod follows to cache DockerHub images?
Can I cache my DockerHub image in RunPod storage?
Is there a way to add storage volume to a pod after creation?
how to remove network volume from pod
Pod seems to behave like 'Spot' even though it's configured for 'On Demand'
Different user of same team see different ComfyUI workflow on same pod
How to auto start jupyter notebook with python create pod
How to add start command via python
Comfyui Model and Node problem
How do i deploy instance on a different region?
ComfyUI idle to running red box issue
Attempting to open a pod when an email is received.
Getting data in/out of a community cloud pod
When does IP of pod change?
possible to take image from dockerhub
0 GPU pod makes no sense
A4000 - 50GB , BUT ONLY GETTING 16 GB BASED ON nvidia-smi
Why the available GPUs are only 1?
Contacting other site$
Network Blips and pods are not accesisble for some time
Network Volume Availability on Community Cloud?
Install Problem
PyTorch 2.3: Lacking image on RunPod
Is there a way to see GPU utilization history?
Docker image pull error: too many requests
Stop Pod from Jupiter Lab
What is the best way to upload a 7GB model to my network drive.
Snapshot from Pod
GPU pod's performance is inconsistent
How to install NVIDIA driver on Ubuntu Server image?
Cloud Sync False "Something went wrong" and secrets fail
NVLink for multi-gpu counts?
Can we use systemctl with pod?
Issues with SD comfyui template
slow secure cloud pod
How do saving plans work?
Terminate POD with SSH
Pods issues
Maintenance scheduled: 5 days downtime and data loss. What does this mean?
Pod Maintenance update days after
Ram issue
free credits
Can I run docker desktop on
Empty Root - No workspace on Exposed TCP Connection
Disk quota exceeded
How to exclude servers in planned maintenance?
Run multiple finetuning on same GPU POD
Can I download audit logs?
Problem connecting to ComfyUI
SD ComfyUI unable to POST due to 403: Forbidden
What is the recommended GPU_MEMORY_UTILIZATION?
Install Docker on 20.04 LTS
Pod Network Issue Stuck
Pod GPU assign issue
Pod Unable to Start Docker Container
How can I install a Docker image on RunPod?
CPU Only Pods, Through Runpodctl
Unable to create template or pod with python sdk version 1.6.2
Pod unable to read environment variables set in templates caused a loss
n00b multi gpu question
runpodctl not found on pod
Cheapest GPU for volume.
Custom Container Start Command Not Working
remote-ssh broken
Networking on my pod has been shit for last 3 days. please fix. US region. RTX 6000 Ada
Backend error
Custom Template with Jupyter not working
Pod system error
Fast loading of large docker image
0 gpu available
DATA LOSS IN EU-RO-1 - URGENT
Clarify RAM available
Create CPU Pod through GraphQL
How to add files?
Very Slow Mapping
How to get a public URL?
Why is the pod still not up even though the "Container is ready"? Using SD 1.9.4 and net volume.
GPU Pods in EU-SE-1 unexpectedly die after approximately 30 hours
cpu instances don't work
Networking Multiple Pods Together
Docker Image For RunPod Pytorch 2.0.1 Template
Can I use torch2.3.0 + cuda 11.8 on Runpod?
is cuda not working?
Not able start Nginx
Not able to ssh via "Overexposed SSH"
Can not kill processes
container start command
Too many Open Files Error on CPU Pod - Easy Repro
recipes
how do you create a compatible docker file?
Strange unix and/or user perms issue with command in dockerfile/replacement command
Console for kohya_ss / Stable Diffusion
NVLink support for H100 NVL
question
How do I raise a support ticket?
Cloud Files Updating Backblaze
Pod GPU keeps disconnecting...
Container Files Missing in Workspace On Pod Launch
Start the pod with a custom command after the pod finishes startup
How do I upload 5 gb file and use it in my pod?
How to start a pod using command line runpodctl or python sdk
What are the certifications for each Data Center *URGET*
Are Pods good for batch inference of encoders?
524: A timeout occurred (cloudflare)
Notification When Pod Is Deployed
Billing for separate users or pods
Automatically Terminate Idle Pods
Slow download
`ERROR | InternalServerError: None is not in list`
Increase Spot Warning Time
how to route docker secrets to pod automatically
Network issue ETA?
same GPU, different machine -> different speed
Kohya port not working
runpodctl -> get public IP + exposed ports
This pod suddenly came into my account ( i didnt create it )
pod has no public ip
can I deploy flask, celery, redis, postgreSQL on runpod?
CudaToolkit >= 12.2
why don't I have a stop option, only terminate option available
are network volumes slower than "normal" volumes?
cannot find my network volume in the running ubuntu pod.
Apply a fix public ip and attach to the running pod, Attach a network volume to the same pod.
graphql Unauthorized
Help needed with Docker Installation
Update image runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04 ?
Questions About Resuming GPU Sessions
"The port is not up yet"
Change disk volume
There is no pod available
wget not working inside the terminal for stable diffusion webUI
RTX 6000 Ada performance much worse than expected
Slow model download speeds/bandwidth
Container Log From Saved Storage stuck on loading loop
Server Volume Access
Cloud Sync - "Something went wrong"
Feature Request: `runpodctl send` TO specific machine & folder (ala SCP)
SSH connection issue
Better solution for 0 GPU stranded volumes
Kasmweb Runpod Desktop failing to connect
pod terminate after command finishes
waiting for logs....
Kohya_SS - Clicked "Start Training" button....how can i tell that it's working?
GPU pods taking long time to install python packages
GPU don't use
4090 GPUs in EU-RO-1 not available or with full memory
While runnning a python file in my pod, I encounter a ModuleNotFound Error for tkinter
google colab image
Switch off pod after 2 hours
Cannot open 7860 port with Oobabooga Text Generation WebUI template
Need password when connect to pod using SSH
jupyter notebook
Super slow network speeds on some pods.
Power capped
connect to pod 1 from pod 2
How to change from root user?
http service [port 7860] Not Ready
Projects ( runpodctl): How to add registry Auth like docker login
runpodctl: start spot instance?
Upload speed issue
NO gpu in community pods
502 Bad Gateway Error
Community Cloud - Med Speed network - Slow outbound connections < 1Mbps
NFS mount is not allowed in pod?
Skypilot & expose-ports
Issue with deploying gpu pod in CA-MTL-3 Region
Network issue with long-running pod lf5zot1ukb8gy8
Network issue with runpod
How to deploy Llama3 on Aphrodite Engine (RunPod)
Max number of Pods
wget command error. 401 error. Trying download the model
Can i set a static port for comfyui pod?
Can runpod fetch docker images from custom registries (i.e. not dockerhub)?
Ramdisk
increase spend limit
Can’t start web app on 80 port cpu based pod
Pod with extremely slow upload
How to tell how much storage being used in pod? (including network drive)
Can't see training progress after reset
Maintenance - only a Community Cloud issue?
SDK GPU naming specification
How to get a general idea for max volume size on secure cloud?
Template pytorch-1.13.1 lists cuda 11.7.1 version but is actually cuda 11.8?
Can't connect to sfpt
Unable to ssh onto my pod with the public key already on the runpod server
Python modules missing when pod is starting
Unable to connect to Pod
I am having trouble finding the location of the model file when trying to use ComfyUI.
Turn on Confidential Computing
"SSH Public Keys" in account settings are completely ignored
Is there an instance type that cannot be taken from you even if you stop the pod?
Kill a pod from the inside?
Performance A100-SXM4-40GB vs A100-SXM4-80GB
API problem
Why is there no indicators of file transfer operations? Am I supposed to guess when they're done?
data didn't persist
Tailscale on Pod
Confidential Computing Support
Ollama on RunPod
Runpod Python API problem while trying to list pods
Multiple SSH keys via Edit Pod option
Why can't I create a pod? The 'deploy' button is not clickable:
L40S aren't available
It is possible to reserve GPUs for use at a later time?
Is there a way to scale pods?
Build with Dockerfile or mount image from tar file
Performance of Disk vs Network Volume
Runpod's GPU power
Error when trying to Load "ExLlamav2"
CPU clocking speed
NVENC driver conflict
Is it all pods based on Docker?
Comfyui runpod don’t save workflow
HTTP service [PORT 7860] Not ready
I'd like to run a job that takes 8x GPUs.. any way I can increase the spend limit?
Suddenly cannot boot SD Pod having trouble with "Could not load settings"
4xH100 pod is stuck -- can't restart or stop
Can't open folder in Jupyter Lab
CPU seems extremely slow
getting a pod's port mapping..
Megatron Container Image Setting
Network Volume and copying data between pods.
How does runpod work with custom docker images? Multiple questions.
My first template
is AWQ faster than GGUF ?
Tensorflow Runpod Container
What would happen when my spot is interrupted and then the spot is back?
A6000 price change based on # GPUS?
Conditions under invoice emails are sent
Who can I contact to get a runpod invoice for more runpod credits? (5k+)
How to add python or API bindings for an vLLM?
Bad file descriptor
ModuleNotFoundError: No module named 'diskcache'
Blocking ICMP?
My pod has randomly crashed several times today, and received emails of Runpod issues.
Can't access Jupyterlab
This is third time and no support for this issue, I lost all of my credits and time.
Spend limit
Do 2 GPUs will fine tune 2 times faster than 1 GPU on axolotl ?
Very slow download via JupyterLab
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memor
SSH Connection Refused
Unable to connect to Jupyter lab
Web terminal keeps closing connection for no reason
No module named 'axolotl.cli
"We have detected a critical error on this machine which may affect some pods." Can't backup data
Operation not permitted - Sudo access missing
Download Mixtral from HuggingFace
Is there a way to run more than 1 image in a pod?
Slow model loading over some instances
ulimit increase?
Can we turn secure cloud instances on/off through some time of trigger function?
How can I do scheduled backups with Azure using API?
Failed to Import Libraries on Runpod SD ComfyUI [RTX A 4000]
How do I select a different template to the default in the new RunPod UI?
Can't open models/checkpoint folder in Jupyter for Comfy UI.
hello guys!I want to buy a RTX4090 pod,but the 46G Ram is not enoght.Is there anyway to upgrade ram?
Am I able to host an app through reverse proxy with a custom domain name?
Is it possible to change region of a network volume?
How do i add cronjob in a pod?
Can't connect to Civital lately when donig WGET commands, what am I doing wrong?
TensorRT-LLM setup
Stable Diffusion Extension Installation Issues:
Is it possible to make port 443 externally accessible?
Comfy launcher issue
Pods shutting down
Connection unexpectedly abort
Downloading file/directory from remote to local using SCP
POD's ERRORS :((((((
Nvidia driver version
Profiling CUDA kernels in runpod
Inconsistency with volumes
No availability issue
L40 and shared storage
Run container only once
Clone a Runpod Networkvolume
Insufficient Permissions for Nvidia Multi-GPU Instance (MIG)
Automatic1111 - Thread creation failed: Resource temporarily unavailable
How can I view logs remotely?
change the GPU pod type without recreating
l40s "no ressources available"
Hi Runpod team is the AttributeError Gradio issue resolved?
permission problems with ooba and textweb ui containers
TCP port external mapping keeps changing every time pod restarts.
I get AttributeError
Controlnet SDXL Models Don't Work
Extremely poor performance PODs with the RTX 4090
Error on RunPod Pytorch 2.1
No CUDA GPU available after not using GPU for a while
Hi! Sometimes I can download models from Civitai, using wget. But other times, I can´t. Example:
Kernel version discrepancy between Pods.
Whatever I do, the ports do not open for the service
API to query Pods
Exposed Port 8888
Question about Pods and data
Availability of A40, A6000
Slow CPU
slow GPU across many community cloud pods
CPU Pod with shm size larger than physical RAM
With a custom template true ssh ask for a password, proxy ssh works perfectly.
multiple nodes
Can't access pods after network outage
wget doesnt work on civitai models
0 x 4090
A New Gold Tutorial For RunPod & Linux Users : How To Use Storage Network Volume In RunPod & Latest
Linux kernel version is 5.4.0
How to scale pod GPU count properly?
distributed training
How can i bulk download all my images generated in my Output Folder
Data loss on pod
Upload files to Network volume? Two days spent on this and can't make it happen
Shell asks for a password when I try to ssh to a secure cloud pod (with correct public key set)
runpodctl create pod for CPU only
docker not found
How to mount network volume to the pod?
Securing Gradio App on Runpod with IP Whitelist
load a new network volumen into a pod?
The Bloke LLM Template ExLlamaV2Cache_Q4 Error
Hello, I have a docker image downloaded on to the pod. How to I use my custom image?
Machine does not support exposing a TCP port
Cannot Install JAX
GPU Name"NVIDIA RTX 4000 Ada Gene..."GPU 0"Error: CUDA unknown error - this may be due to an
how to get kernel 5.5.0?
Stable Diffusion Stopped Working After a Restart
How to start a tensorboard from the pod?
Losing all important data in /workspace folder while pod is running :(
Installing Bittensor?
Connectivity issue on 4090 pod
P2P is disabled between NVLINK connected GPUs 1 and 0
Pod with different IPS?
No GPU Available
Find Config of Deleted Pod
torch.cuda.is_available() is False
UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda
Latest version of Automatic1111 in 'RunPod Automatic1111 Stable Diffusion Template t '
How to stop a Network Disk
Pod Downsized, with Pictures
I'm pretty sure I've been getting pods where "/" lives on a network disk
Question about graphql API
Create new pod with runpodctl
Community cloud servers repeatedly fail to correctly download containers
Urgent: All new gpu pods are broken
CPU Pods NOT WORKING
GPU usage when pod initialized. Not able to clear.
Chat History, Memory and Messages
Increase number of GPU-s in the existing pod?
Keeping reverse proxy hostname between destroy/start
Cuda 12.0 version template is missing
Not able to connect to Web Terminal after increasing the container disk size of the pod
Need to move credit from personal account to team account
Waiting for hours
error in pod
Why are secure cloud pods so slow?
Different levels of performance from same GPU types in Community Cloud
No GPU, RO RTX4090 node
Could not find CUDA drivers
Ignore root start.sh and use custom persistent script.
streamlit app not loading up on CPU node
Issues with changing file permission to 400
Why FileBrowser cant be opened?
Are there very few GPUs that support CUDA 11.8?
GPU speed getting slower and slower
"How can I run multiple templates in one pod?"
How do I run Docker in a RunPod environment?
[ONNXRuntimeError] when running ComfyUI
Running sshuttle in my pod
How to stop a Pod ?
Network issues with 3090 pods
are we able to run DinD image for GPU pods?
Runpod error starting container
Runpod SD ComfyUI Template missing??
Pod Outage
Cuda - Out of Memory error when the 2nd GPU not utilized
Backdrop Build V3 Credits missing
When on 4000 ADA, it's RANDOMLY NOT DETECTING GPU!
cant get my pod to work right
Error occured
Can i still access the data of my GPU pod once my account run out of funds
Can I Sync Contabo storage
Save docker session
Frequent GPU problem with H100
Post seems to have lost internet.
OSError: [Errno 5] Input/output error
Error while running ComfyUI
GPU cloud storage GONE + billed for entire month
Trying to create a Spot GPU instance leads to 400 response error
Where are all the U.S. network volume data centers?
Managing multiple pod discovery
How to withdraw money ?
inconsistent speeds--community pod, any tips
H100 PCIe and SXM stability issues
2024-03-01T16:08:54.761577365Z [FATAL tini (6)] exec docker failed: No such file or directory Error
I want to install docker in a GPU pod.
OpenBLAS error
We have detected a critical error on this machine which may affect some pods.
Is it possible to restart the pod using manage Pod GraphQL API?
Training for days
Disk reading unacceptably and mind boggingly slow
"Pricing error for savings plan"
/workspace not writable
Tokenizer error
How to use the comfyui API when running it inside Runpod GPU pods
GPU Host Registration
Help with constantly crashing GPU pods
[Urgent] failed : Software caused connection abort
how to distribute usage of GPU
Converting to Team Account
terminal
Compatibility of RTX A6000 for Multi-GPU Training
H100 multi-gpus settings
Container fails to start randomly
s3 slow upload
About the cost of container initialization phase
Broken CUDA / PyTorch on H100
Cannot connect to pod, web UI stating "Network Issues", https://uptime.runpod.io/ showing all green
Cannot connect to CPU pods
My pods are missing, but still charge me everyday
Network issue?
Pod running but inaccessible
instances available A100 80GB
https://www.runpod.io/console/pods keeps reordering servers
A1111 wont find my files
ngc tritonserver container image not usable?
"Too many open files in system"
What the fuck is going on again with US - 1 x H100 80GB SXM5
GPU runpod critical error detected
stable diffusion - how do I view the active log?
Pod using CPU instead of GPU
gpu Not usable
After tying the service for the first time, out of funds because of a stale pod after disconnecting
pod does not show public ip & ports
Pod is unable to find/use GPU in python
Pod is stuck in a loop and does not finish creating
Runpodctl in container receiving 401
Cannot establish connection for web terminal using Standard Diffusion pod
Runpod errors, all pods having same issue this morning. Important operation
Hi, I have a problem with two of my very important services, and I received the following message
Error while using vLLm in RTX A6000
502 error when trying to connect to SD Pod HTTP Service on Runpod
correct way to call jupyter in template
Too many failed requests
Community pod: very bad download speed from github.
Skypilot + Runpod: No resource satisfying the request
`runpodctl stop pod $RUNPOD_POD_ID` failing with 401
Stuck pod instance
Start container pod error
Pod doesn't recognize my SSH key
Run Lorax on Runpod (Serverless)
What is the difference between secure cloud and Community Cloud?
Urgent Prod Issue
cuda version filter
Maximum length for value of environment variables
Enquiry about pod ID oi3rnyumuzvp2s
GraphQL Cuda Version
Any template with python 3.9.* or how to install it
Match IPs with GPUs
Container is not running error
Pod stopped on restarting no data
Zero GPU issue
Start and stop multiple pods
`runpodctl send` crawling at <1MB speeds
Cannot create pods even there are available gpus
Transfer/Duplicate Network Volume
screen spot
/usr/bin/bash: cannot execute binary file
sudo missing
Can I watch system utilization in linux terminal?
Network Storage load issue
How do I edit the pre_start file on a pod and have it persist?
Mutli GPU
Unable to use model in stable diffusion
Need help with setting up Tensorboard for RVC!
Storage pricing question
Creating own template
Error when installing requirements of git:
Container keeps restarting
Unable to upload models to Stable Diffusion.
How should I store/load my data for network storage?
worker-vllm list of strings
How to enable Systemd or use VPN to connect the IP of the Run Pod?
best practice to terminate pods on job completion
Can I turn off few vCPU?
Deploying H2O LLM Studio /w auth using Ngrok
Wrong GPUs being assigned
Network Volume suddenly empty in EU-RO-1
Reserving pods on different machines
Ollama API
Is one physical CPU core assigned to vCPU?
We have detected critical error on this machine....!
Slow upload speeds with runpodctl?
Expected all tensors to be on the same device
Urgent: Workspace Disconnected
Speedtest for slow pod
TCP Port Not Working
Can't login
Stable Diffusion GPU Pod and API
Horrible network speeds make the pod unusable.
How can I deploy Mixtral using Ollama as service?
520: Web server is returning an unknown error
Driver mismatch
Servers' availability: "Any" region vs Specific regions
File copying does not occur in Custom Template
Having trouble with Serverless SD XL image
What does "Low Availability" mean?
Network bandwidth?
Docker In Docker custom image for GPU pods and Presistant or Network volume support in CPU Pods?
getting ECONNREFUSED while trying to communicate on exposed tcp port with comfyUI API.
How expose a tcp port without losing the pod data?
Error connecting to runpod
Transfering files to new Pod
Nonexistant download speed.
IPv6 Support?
Docker issues on RTX A6000 ADA gpu pod.
Error connecting to gpu cloud instance.
Unable to register, email blocked.
Jupyter Notebook is not showing the output of any code
How to find the proper template: "The NVIDIA driver on your system is too old"
Cannot SSH login from Cursor (VS Code)?
Mass files download from google drive
Hosting RTX A4000 GPU's in Community Cloud
RTX 4090 POD Cuda issue
Secrets character limit & validation
NO Region Pods keeping block when start my docker image.
Public setup IP Unreliable
Host payout
GraphQL: Query specific Endpoints and getting running worker amount
podTerminate query returns error GRAPHQL_VALIDATION_FAILED
How do I run custom code on a Runpod instance?
Running on local URL but can't access from outside
How can I use ollama Docker image?
Comfyui won't run because of the missing NVidia drivers
RunPod Library + API
Cuda Driver
Very low download speed. Will take days to download the model
How do I start a pod with a private docker image (template) using GraphQL?
I just re-initialized a suspended pod and now I don't have gpu drivers
Assistance Requested for Pod Initialization Issue
Overcharged for Pod.
Missing port buttons and Unable to “start web terminal” on Ultimate Template
Any recent firewall changes?
Becoming a host MI250
GPU Pod was down all the night
H100 cluster group compilation error
Stuck in creating container
Custom template bash: /start.sh: No such file or directory
Why are my model files only 135 bytes after a clone repository on Pytorch template?
I cannot connect to server using Web Terminal. It says 'Connection Closed'
Proxy Url related info
Deploying a PDF converter app to serverless
Managing savings plan using graphql API
"There are no longer any instances available with enough disk space" from graphql
How to use multiple GPUs for Kohya Training?
question about reserving time
/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: docker: not found
(Not solved, needed to add a tag) Possible network flakiness with network volumes on EU-RO-1
My pod disk is full.
how exacly join to comunity POD as GPU provider
Can't Delete Storage Volume
Issue uploading files to Jupyter
Kohya_ss: Syncing For >20 Minutes?
Can't upload videos, getting TypeError: Failed to fetch
Deploying yolov8 on RunPod
Pod impossible to access
Multiple Issues
Windows OS Available?
Custom Templates are not loading on Secure Cloud
can’t run my own init script
How do i create an encrypted volume programmatically?
Pod still asking to log in
ComfyUI Manager button doesn't show
Errors while running FaceFusion 2.2.1
Trying to run a Fooocus Realistic Edition POD an running into errors.
Python3.8.10 and Venv
Controlnets not working
No longer able to Use Jax on H100 machines
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0
Could you please implement template credentials functionality for python SDK
download problem
Azure speech services on runpod
Stable Diffusion ComfyUI: Error while deserializing header: HeaderTooLarge
How to run ComfyUI on RunPod?
"Host is Down" error
Automate pod&template creation
Test
Servicio no iniciado
RunPod Automatic1111 Installation?
nvidia-glx-desktop - how to make it work
need SU password for the RunPod Desktop template 'runpod/kasm-docker:cuda11'
Custom template creation with AWS ECS
When trying to git pull Comfy nodes into my RunPod, I'm met with a divergent branch error?
Running 2x H100 80gb. Does this mean my cap is now 160gb of vram?
GPU cloud template to manage network volume
Cache a Docker image to reuse
RTX3090 is available on the selection page but my stooped pod is still 0 gpu
after scheduled maintenance today on my pod i now can not connect to the TCP port I set up with venv
Issue installing Foocus Runpod
how can i see the GB usage of my network volume?
sh: 1: accelerate: not found
A way to connect to an AWS VPC
8x H100 SXM5, Error 802
Attaching a Network Volume fails when using GraphQL
Container logs disappear after stopping the container
CUDA 12.3 support
Is there a way to get pod logs programmatically?
GPUs look available via `runpod.api.ctl_commands.get_gpu()` which aren't available.
Serverless endpoint long waits in "Initializing" state
Foooocus too slow on generation
Image Generation problem
could not start a temporarily closed pod
Outdated controlnet how to update?
There are no available GPUs on this host machine
copy folders from one location to another, inside Jupyterlab?
a6000 is apparently all gone but still available on page
Empty trash?
Versioning serverless endpoints
how can I find my pod's ip address?
"This server has recently suffered a network outage and may have spotty network connectivity." and
Multinode training Runpod ports
Feature Request / Is it possible RunpodCTL
How to mount persistent storage volume in pod?
RunPod SD InvokeAI v3.3.0 Errors
ENDPOINT IS
connect ssh vscode to runpod gpu server
environment variable not accessible from true ssh ?
Pod disappeared after yesterdays maintenance
How to enable Jupyter Notebook and SSH support in a custom Docker container?
open ports
[Urgent] One GPU suddenly went away
Does GPU Cloud service support Illyasviel/Fooocus AI?
Pod suddenly says "0x A100 80GB" and cuda not available
Moving storage location
is your network volume charged by actual usage or the fixed number keyed in during setup?
Error 804: forward compatibility was attempted on non supported HW
"We have detected a critical error on this machine...failing pods
Webhook URL
stop pod
How to transfer between pods?
Network connection
Multi-node training with multiple pods sharing same region.
Dev Accounts Adding Public Key
Does Runpod Support Kubernetes?
Does GPU Cloud is suitable for deploying LLM or only for training?
Issues with connecting/initializing custom docker image
Error occurred when executing STMFNet VFI: No module named 'cupy'
my pod start very slow
Template sharing in a team doesn't work
ComfyUI not launching
I can't shutdown my pod ?
LocalAI Deployment
Jupiter notebook (In chrome tab) consistently crashing after 20 hours
Extremely slow sync speed
How can I remove a network volume?
Can I remove a GPU & resize my storage after I've created a pod?
Need to update Auto1111 to 1.7.0
How can I clean up storage in my network volume?
Is there a way to get the SSH Terminal address for a pod using GraphQL api?
Help deploying LLaVA Flask API
Does RunPod support H100 confidential computing?
Restricting the kinds of pods dev accounts can launch
ssh2 with node doesn't work correctly ?
Error starting the container
Are the EU-CZ-1 servers down?
extremely slow network and hard to connect throuhg ssh or jupyter
remote desktop with pods
My pods in the CZ network are down.
Can I use VsCode remote-ssh with a runpod instance with no public ip?
How to install SillyTavern to an instance?
refer to the current running pod's id from environment variable
Cannot connect to jupyterlab/web terminal
How do I upload a model to GPU-Cloud Stable Diffusion?
Unable to SSH
24 GB VRAM is not enough for simple kohya_ss LORA generation.
Trouble with SSH via PuTTY
install in network volume
How can I enter in Stable Diffusion Webui arguments in a instance with the SD template?
Running LLaMA remotely from a Python script
Urgent! So slow Download Speeds (Both Secure / Community Clouds)
How to use runpod for multi-machine distributed training?
Check if a pod is idle
the bloke and llm not working
Large discrepancy in broadband available and broadband used
reproducible: pods crash 50% of the time
server problem
Slow Download Speed as well - over 6 hours of downloading 4+ GB of files (and still running.).
Ensuring SSH over exposed TCP
Immediate assistance required!
Immediate Assistance Required: Ongoing Service Disruption and Request for Compensation
Speed of downloading files from server abnormally slow
Services don't start
CUDA not recognized
Services Stopped
ComfyUI custom nodes (IMPORT FAILED) after server stop
Cuda error: illegal memory access encountered
Cuda out of memory
cannot install flask
Creating a Custom Template
Problems with larger models
Storage contents disappear
Pod stuck trying to install dependencies.
Slow uploading speed to Jupiter
Pods not starting
Integrating Loras and Checkpoint into Fooocus ashleykza/fooocus:2.1.855 with preset realistic
The actual storage space of the network volume is wrong.
If my RunPod ran out of money and stopped running.
billing not adding up
Using your SD ComfyUI template, I'd like to run multiple instances of comfy on different gpus.
SSH key not working

Gaming

Programming

All posts for RunPod