How do I set quantization to fp8 in the serverless settings?
Store models in VRAM
Setting up runpod serverless from scratch
RunPod Serverless Endpoint Issue - Jobs Complete But No Output Returned
Serverless Endpoint with vLLM (Qwen2.5-VL-3B-Instruct)
Build in pending for hours
5090 disappeared in Serverless

Setting up a serverless endpoint for a custom model
Regions with better/guaranteed bandwidth
Updating CMD override after creating endpoint
Configuring endpoints via API if publishing via Github integration
s3 path for serverless image gen uploads (comfyui)
Most Available GPUs
Serverless worker won't even start but counts as running
CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected. 
I tried to tweak the docker image a bit but no success.
Then i made the docker image install the package nvidia-cuda-toolkit and now, on the 5090 pod i get error CUDA error: no kernel image is available for execution on the device....Hi All,

For 24G Pro Machines: Memory Allocation Must Not Be Less Than 60GB
Requests vs Jobs

How to create serverless template
