RunPod•15mo ago

Probleme when writing a multi processing handler

Hi there ! I got an issue when I try to write a handler that processes 2 tasks in parallel (I use ThreadPoolExecutor). I use the transformers library by HF for loading the models and I use Langchain to process the inference. I tested my handler on Google collab, it works well, so I create my docker template and create an endpoint in Runpod, but when it comes to the inference, I constantly have an error : CUDA error: device-side assert triggered. Which I don't have when I test the handler on collab. How can I handle that, and particularly, what can cause this error ? Because I use a 48GB GPU (which is highly sufficient for my models that take around 18 GB in total), so it can't be a resource issue.

3 Replies

ashleyk•15mo ago

If you're trying to process concurent jobs, you need to follow this doc: https://docs.runpod.io/serverless/workers/handlers/handler-concurrency

Concurrent Handlers | RunPod Documentation

RunPod supports asynchronous functions for request handling, enabling a single worker to manage multiple tasks concurrently through non-blocking operations. This capability allows for efficient task switching and resource utilization.

Blah BlahOP•15mo ago

Thanks ! I'll that. I naively thought I didn't have to change anything from a local handler code. Hopefully that solves the problem

J.•15mo ago

More complex example if u care: https://github.com/justinwlin/Runpod-OpenLLM-Pod-and-Serverless/blob/main/handler.py

GitHub

Runpod-OpenLLM-Pod-and-Serverless/handler.py at main · justinwlin/R...

A repo for OpenLLM to run pod. Contribute to justinwlin/Runpod-OpenLLM-Pod-and-Serverless development by creating an account on GitHub.

Gaming

Programming

Probleme when writing a multi processing handler

Did you find this page helpful?