Help Needed: Multiple YOLO Model Instances Causing GPU Overload With Multi-Camera Real-Time CCTV
Hi everyone,
I’m working on a real-time video processing project in a chicken farm where I need to continuously detect objects and track daily activities such as feedbag dumping counts. I currently process 10 CCTV streams simultaneously on a local workstation with an RTX 5080 GPU.
I’m using FFmpeg to fetch frames from IP cameras in real time and running inference using 5 trained YOLO models — one large (YOLOv11-L) and four small variant models.
The main issue I’m facing:
Some cameras use the same model, but a new model instance is being created for each camera.
As a result, more than 5 model inference sessions get loaded into GPU memory, causing GPU overutilization and eventual system crashes or restarts.
When I scale up to 10 streams, GPU memory gets exceeded quickly because every stream launches its own model load.
What I need help with:
Why is YOLO creating multiple model inference instances even when models are shared across cameras?
How can I properly share a single model inference instance across multiple camera streams?
How do I prevent GPU overload and system restarts when running multi-stream pipelines?
Technical Setup
GPU: NVIDIA RTX 5080
Framework: YOLOv11
Input: 10 IP cameras, real-time
Frame ingestion: FFmpeg streaming
Multiple models assigned to different logic tasks
Continuous object detection & tracking required
Any recommended architectures, multiprocessing strategies, or best practices would be greatly appreciated — especially for managing multiple camera feeds with shared model weights without repeatedly loading them in GPU memory.
Thanks in advance!
7 Replies
Why is YOLO creating multiple model inference instances even when models are shared across cameras?YOLO is a model. It doesn't do anything on its own. Your code is probably unoptimized and is the cause of this.
How can I properly share a single model inference instance across multiple camera streams?Don't load the model multiple times in your code for each camera and use the same loaded model for inference for all of them.
Yes, the issue is when i have two cameras for tracking feedbag dumps when using model() it is not tracking the feedbags by assigning seperate yolo track id for feedbag, when i se model.track() seperate id gets assigned. But as far as i checked model.track() requires seperate inference for each cameras to track the feedbags and distinguish between each cameras.
Can i share the script
Does the RTX - 5080 is suitable for this or any processor problem
I have also used threading in python for each model inference
You could use something like a triton inference server and serve the model through that, it does automatic batching as well.
https://docs.ultralytics.com/guides/triton-inference-server/
I have tried this way but titron inference server can not use the .pt model i have to convert it to ONNX after converting to ONNX i have been facing a lot of mis detections.
below is the code i used for conversion
model = YOLO("C:/python/scripts/ai_detection_script/models/trained_model_v19l.pt")
model.export(
format='onnx',
imgsz=640, # Height, Width
dynamic=False
)
The accuracy would be the same unless your original model was trained with a different
imgsz and not 640I have trained it using the below configurations
model = YOLO("yolo11l.pt")
model.train(data=yaml_path, epochs=100, batch=16, imgsz=640)
model.train(data=yaml_path, epochs=100, batch=16, imgsz=640)
You can try converting with
dynamic=True