R
RunPodβ€’3mo ago
rewbs

Docker image using headless OpenGL (EGL, surfaceless plaform) OK locally, fails to CPU in Runpod

Hi all, I'm wondering if anyone can educate me on what would be causing this difference in behaviour when running a container locally versus in Runpod, and whether there is a solution. In summary I'm trying to run a headless OpenGL program in a docker container, by using ELG with the surfaceless platform (https://registry.khronos.org/EGL/extensions/MESA/EGL_MESA_platform_surfaceless.txt). I was able to get the program working as intended in a container outside of Runpod. But once deployed to Runpod, it falls back to CPU processing. As a minimal testcase, it's sufficient to simply run eglinfo, a utility which tells you what EGL devices are available. Outside of runpod multiple are available, but in Runpod none are. The testcase and example outputs are available here: https://github.com/rewbs/egldockertest . Any ideas very much appreciated! (As an aside, I should note I'm by no means an OpenGL expert so I might be getting confused, or at very least getting the terminology wrong.)
GitHub
GitHub - rewbs/egldockertest: Egl in docker container / cog
Egl in docker container / cog. Contribute to rewbs/egldockertest development by creating an account on GitHub.
14 Replies
Madiator2011
Madiator2011β€’3mo ago
What kind of program to try to run?
RobBalla
RobBallaβ€’3mo ago
My desktop image uses EGL and is derived from Selkies EGL for Kubernetes (linked in the repo). You'll need to install Nvidia display drivers because there is no /dev/dri on RunPod
rewbs
rewbsβ€’3mo ago
An old-school audio-visualisation renderer (I'm the author of https://vizrecord.app/ which is client side – from there you can probably guess what I'm building πŸ™‚ ). Thanks so much, will take a look Wow that looks like an impressive piece of work. Am I right in thinking your image re-installs the driver on every startup? If so I assume it's designed for a long-running pod rather than serverless tasks – and probably won't be sensible for my serverless usecase where a job execution would typically be under 30s.
RobBalla
RobBallaβ€’3mo ago
Yeah that wouldn't make much sense unfortunately. I raised an issue with the Selkies EGL repo and their feedback was that the driver install shouldn't be necessary but my experience was llvmpipe rendering without it - But I am hopeful there is a solution
tom
tomβ€’4w ago
hey! been a while, but im running into the same problem. were you able to resolve?
rewbs
rewbsβ€’3w ago
Hey, nope – still can't get it to run on GPU. I'm resorting to running this process in parallel to other tasks (that do use the GPU) within the same serverless invocation! πŸ™‚ If you figure it out please report back! I wonder if it's something to do with the privileges made available to docker containers in Runpod vs locally.
nerdylive
nerdyliveβ€’3w ago
Well it's decided from the software to use which hardware @rewbs Maybe they don't support the gpu like Nvidia on Linux or what os it's using Or wrong driver probably Not sure how the software works so cant debug yet
rewbs
rewbsβ€’3w ago
The hardware is definitely there and supported. πŸ™‚ My serverless endpoint kicks off 2 concurrent processes on the same serverless worker: one surfaceless ELG task (similar to the example codebase above), which fails to detect and use the Nvidia GPU, and one "standard" python ML process, which does find and use the Nvidia GPU.
nerdylive
nerdyliveβ€’3w ago
I mean not the hardware the software Different software might not work as other sodtwares
rewbs
rewbsβ€’3w ago
Oh. Which software are you referring to though? My code? (there are many layers of software in play here πŸ™‚ )
nerdylive
nerdyliveβ€’3w ago
Yeps that's what I'm not sure of, because I'm not be able to read the codes there but if there's some docs that says its compability maybe it can help Oh what are egl? Are those types of gpu or some custom hardware's? I have no experience in these fields sorry so I might can't help you much
rewbs
rewbsβ€’3w ago
No worries, this is not an easy problem. EGL is a software layer above OpenGL which supports headless rendering.
nerdylive
nerdyliveβ€’3w ago
Oh wow You might wanna browse more onto how egl or the libraries, code they use to point to Nvidia drivers or nvidia's codes
tom
tomβ€’3w ago
finally got it actually by installing the right nvidia driver (5.35) on our debian slim image. we’re not doing serverless tho just pods for now
Want results from more Discord servers?
Add your server
More Posts
Moving to production on Runpod: Need to check information on serverless costsHi team. I'm working with my company to move our product to release, with a soft launch in April. WShell asks for a password when I try to ssh to a secure cloud pod (with correct public key set)I have a correctly formatted public key set, I have ssh enabled. Still asks for a password when I ssrunpodctl create pod for CPU onlyHello, i try to create pod from cli width **runpodctl create pod --gpuCount 0** but i have this errodocker not foundHello, I get an error from the container attempting to launch: /opt/nvidia/nvidia_entrypoint.sh: linHow to mount network volume to the pod?Hey all, I created a network volume and have a pod. How do I connect the network volume to the pod?Securing Gradio App on Runpod with IP WhitelistHello, I'm running a Gradio app configured with share=True on Runpod. I can access it, but I'd like load a new network volumen into a pod?I am new in runpod. Recenlty, I have created a network volume and try to load it in a GPU pod. WhileThe Bloke LLM Template ExLlamaV2Cache_Q4 ErrorHas anyone found a way around this. I use to use the pip install --upgrade exllamav2 command in the Hello, I have a docker image downloaded on to the pod. How to I use my custom image?I don't think I can do a docker inside a docker, can I? I appreciate any guidance here!Machine does not support exposing a TCP portMy prod pod needs to expose public network ports, so we used the template to configure "Expose TCP p