Ultralytics•2w ago

Help Needed: Multiple YOLO Model Instances Causing GPU Overload With Multi-Camera Real-Time CCTV

Hi everyone, I’m working on a real-time video processing project in a chicken farm where I need to continuously detect objects and track daily activities such as feedbag dumping counts. I currently process 10 CCTV streams simultaneously on a local workstation with an RTX 5080 GPU. I’m using FFmpeg to fetch frames from IP cameras in real time and running inference using 5 trained YOLO models — one large (YOLOv11-L) and four small variant models. The main issue I’m facing: Some cameras use the same model, but a new model instance is being created for each camera. As a result, more than 5 model inference sessions get loaded into GPU memory, causing GPU overutilization and eventual system crashes or restarts. When I scale up to 10 streams, GPU memory gets exceeded quickly because every stream launches its own model load. What I need help with: Why is YOLO creating multiple model inference instances even when models are shared across cameras? How can I properly share a single model inference instance across multiple camera streams? How do I prevent GPU overload and system restarts when running multi-stream pipelines? Technical Setup GPU: NVIDIA RTX 5080 Framework: YOLOv11 Input: 10 IP cameras, real-time Frame ingestion: FFmpeg streaming Multiple models assigned to different logic tasks Continuous object detection & tracking required Any recommended architectures, multiprocessing strategies, or best practices would be greatly appreciated — especially for managing multiple camera feeds with shared model weights without repeatedly loading them in GPU memory. Thanks in advance!

7 Replies

Toxite•2w ago

Why is YOLO creating multiple model inference instances even when models are shared across cameras?

YOLO is a model. It doesn't do anything on its own. Your code is probably unoptimized and is the cause of this.

How can I properly share a single model inference instance across multiple camera streams?

Don't load the model multiple times in your code for each camera and use the same loaded model for inference for all of them.

James Ananth ZsOP•2w ago

Yes, the issue is when i have two cameras for tracking feedbag dumps when using model() it is not tracking the feedbags by assigning seperate yolo track id for feedbag, when i se model.track() seperate id gets assigned. But as far as i checked model.track() requires seperate inference for each cameras to track the feedbags and distinguish between each cameras. Can i share the script Does the RTX - 5080 is suitable for this or any processor problem I have also used threading in python for each model inference

Skillnoob_•2w ago

You could use something like a triton inference server and serve the model through that, it does automatic batching as well. https://docs.ultralytics.com/guides/triton-inference-server/

Triton Inference Server with Ultralytics YOLO11

James Ananth ZsOP•2w ago

I have tried this way but titron inference server can not use the .pt model i have to convert it to ONNX after converting to ONNX i have been facing a lot of mis detections. below is the code i used for conversion model = YOLO("C:/python/scripts/ai_detection_script/models/trained_model_v19l.pt") model.export( format='onnx', imgsz=640, # Height, Width dynamic=False )

Toxite•2w ago

The accuracy would be the same unless your original model was trained with a different imgsz and not 640

James Ananth ZsOP•4d ago

I have trained it using the below configurations model = YOLO("yolo11l.pt")
model.train(data=yaml_path, epochs=100, batch=16, imgsz=640)

Toxite•4d ago

You can try converting with dynamic=True

Gaming

Programming

Help Needed: Multiple YOLO Model Instances Causing GPU Overload With Multi-Camera Real-Time CCTV

Did you find this page helpful?