R
RunPod6mo ago
Daan

Issue with Dependencies Not Being Found in Serverless Endpoint

I am encountering an issue with a network volume I created: First,I created a network volume and used it to set up a pod. During this setup, I modified the network volume: In the directory where the network volume was mounted, I created and activated a virtual environment (venv). I then installed various dependencies in this environment. Then, I have created a serverless endpoint that utilizes this network volume. As far as I understand, this network volume is mounted on the directory runpod-volume. I initiate the venv located in this directory and then start a program that is also stored there. However, I soon encounter a problem: the dependencies that I had installed are not being found. Could you please help me identify where I might be going wrong in this process? It seems like the dependencies installed in the venv are not being recognized or accessed by the serverless endpoint. Thanks
Solution:
I think it is HIGHLY better to just bake the dependencies into the Dockerfile and activate it that way. Also without seeing ur dockerfile is hard...
Jump to solution
14 Replies
ashleyk
ashleyk6mo ago
venv's are stupid and hard code the paths, so if you create it using /workspace in GPU cloud, it will have that hard-coded into the venv itself. I work around this by using symbolic links in serverless. eg.
echo "Symlinking files from Network Volume"
rm -rf /workspace && \
ln -s /runpod-volume /workspace
source /workspace/venv/bin/activate
echo "Symlinking files from Network Volume"
rm -rf /workspace && \
ln -s /runpod-volume /workspace
source /workspace/venv/bin/activate
Daan
Daan6mo ago
isn't changing the name of workspace to runpod-volume here the better/easier option?
ashleyk
ashleyk6mo ago
You can try that too
Daan
Daan6mo ago
Is there something special to a directory named "workspace" in RunPod? Because now that I changed the name of workspace directory to runpod-volume, is have this strange porblem (see message.txt) where I download all the dependencies and then execute a file that needs them. In the logs I see this error (but I do not understand why):
2024-01-01T16:53:06.410909616Z Traceback (most recent call last):
2024-01-01T16:53:06.410938427Z File "/runpod-volume/./app.sh", line 5, in <module>
2024-01-01T16:53:06.413309706Z import torch
2024-01-01T16:53:06.413320126Z ModuleNotFoundError: No module named 'torch'
2024-01-01T16:53:06.410909616Z Traceback (most recent call last):
2024-01-01T16:53:06.410938427Z File "/runpod-volume/./app.sh", line 5, in <module>
2024-01-01T16:53:06.413309706Z import torch
2024-01-01T16:53:06.413320126Z ModuleNotFoundError: No module named 'torch'
. I gues then these dependencies also aren't saved in the netword volume... (Yes, I changed the mount path in my template to "/runpod-volume")
Daan
Daan6mo ago
(this may be more a question for GPU cloud, but I will keep it here if that's okay)
nathaniel
nathaniel6mo ago
that latter pip install output says you're running out of disk in whatever place you're installing the dependencies
...
Installing collected packages: sentencepiece, mpmath, cymem, bitsandbytes, wasabi, urllib3, typing_extensions, tqdm, sympy, spacy-loggers, spacy-legacy, sniffio, smart-open, safetensors, regex, PyYAML, psutil, protobuf, packaging, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, murmurhash, msgpack, MarkupSafe, langcodes, idna, h11, greenlet, fsspec, filelock, distro, cloudpathlib, click, charset-normalizer, certifi, catalogue, annotated-types, typer, triton, srsly, scipy, requests, pynvim, pydantic_core, preshed, nvidia-cusparse-cu12, nvidia-cudnn-cu12, Jinja2, httpcore, blis, anyio, pydantic, nvidia-cusolver-cu12, huggingface-hub, httpx, torch, tokenizers, openai, confection, weasel, transformers, thinc, accelerate, spacy, en-core-web-sm
ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device

...
...
Installing collected packages: sentencepiece, mpmath, cymem, bitsandbytes, wasabi, urllib3, typing_extensions, tqdm, sympy, spacy-loggers, spacy-legacy, sniffio, smart-open, safetensors, regex, PyYAML, psutil, protobuf, packaging, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, murmurhash, msgpack, MarkupSafe, langcodes, idna, h11, greenlet, fsspec, filelock, distro, cloudpathlib, click, charset-normalizer, certifi, catalogue, annotated-types, typer, triton, srsly, scipy, requests, pynvim, pydantic_core, preshed, nvidia-cusparse-cu12, nvidia-cudnn-cu12, Jinja2, httpcore, blis, anyio, pydantic, nvidia-cusolver-cu12, huggingface-hub, httpx, torch, tokenizers, openai, confection, weasel, transformers, thinc, accelerate, spacy, en-core-web-sm
ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device

...
torch and its dependencies are in that list which is why it can't import torch later so maybe the size of the network volume is smaller than the size of local disk for your container?
Daan
Daan6mo ago
the size of the network volume is 250GB, so that won't be the problem here. when installing, I also only see the disk utilization of the container increasing, so maybe these dependencies are just installed on the container disk after all (but I think the commands I gave should be correct)? Also, I do not fully understand why this happens: If I have downloaded some dependencies in the venv, so they are saved in the network volume, I can reuse them because executing files (in the venv) that use the dependencies works fine. But, if I want to reinstall these dependencies I don't get something like "Requirement already satisfied:.....". Also "pip list" doesn't show these dependencies. This is really strange to me.
ashleyk
ashleyk6mo ago
pip uses the user home directory to cache things by default, you can try add --no-cache-dir to your pip commands to prevent it from caching things on the container disk. It solved it here: https://discord.com/channels/912829806415085598/1191380773425659924
Daan
Daan6mo ago
yes, I use --no-cache-dir , but still I got the error. I thought it was solved because I didn't get the same error (I honestly didn'tt see tgis one-line error at first) this is the output now (message.txt) But I think everything is installed correctly since I can execute a file that needs these dependencies, but I find it strange that if I execute the same command twice (after a new pod has been started with the same network volume) nothing comes up. as "Requirement already satisfied:.....". But running the program does work. However, when I start a serverless endpoint and want to do the same, I get the error that those dependencies are not installed.
ashleyk
ashleyk6mo ago
Make sure your path names match exactly, venv is hard-coded as I mentioned intially.
Solution
justin
justin6mo ago
I think it is HIGHLY better to just bake the dependencies into the Dockerfile and activate it that way. Also without seeing ur dockerfile is hard
justin
justin6mo ago
Using venv through ur runpod volume is kind of a waste of time - more frustration that it is worth and still increased latency pulling packages across two diff drives
justin
justin6mo ago
GitHub
runpodWhisperx/Dockerfile at master · justinwlin/runpodWhisperx
Runpod WhisperX Docker Container Repo. Contribute to justinwlin/runpodWhisperx development by creating an account on GitHub.
justin
justin6mo ago
Can refer to my whisperx dockerfile where I do create a venv and activate it for the env to stabilize all my dependencies especially cause this is serverless and assuming u are storing any rlly heavy models on the network storage - i dont see any reasons why a venv for dependencies backed into ur dockerfile itself would be bad. ud still have a super small dockerfile / fast to deploy / fast to spin up