R
Join ServerRailway
✋|help
Booting worker looping infinitely
Hi there, I'm facing an issue where "Booting worker with pid: XXX" is looping infinitely despite no exit signals. The logs are:
I've looked at https://github.com/benoitc/gunicorn/issues/1663 and https://medium.com/@soleilstudio/booting-worker-with-pid-n-in-looping-infinitely-1c70a604231e and it seems to be memory problem. I've upgraded to developer plan and I do see spikes to 512MB under Metrics > Memory so I suspect that is the issue.
Project ID: f1c85024-f36a-4e32-89f1-694999669ee1
[2023-05-20 02:24:28 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2023-05-20 02:24:28 +0000] [1] [INFO] Listening at: http://0.0.0.0:6516 (1)
[2023-05-20 02:24:28 +0000] [1] [INFO] Using worker: sync
[2023-05-20 02:24:28 +0000] [10] [INFO] Booting worker with pid: 10
Loading model...
Model loaded
Starting prediction...
[2023-05-20 02:25:49 +0000] [58] [INFO] Booting worker with pid: 58
Loading model...
Model loaded
Starting prediction...
[2023-05-20 02:25:55 +0000] [106] [INFO] Booting worker with pid: 106
I've looked at https://github.com/benoitc/gunicorn/issues/1663 and https://medium.com/@soleilstudio/booting-worker-with-pid-n-in-looping-infinitely-1c70a604231e and it seems to be memory problem. I've upgraded to developer plan and I do see spikes to 512MB under Metrics > Memory so I suspect that is the issue.
Project ID: f1c85024-f36a-4e32-89f1-694999669ee1
Solution
have you since re-deployed your app after upgrading to dev plan?
oh let me try that
takes about 7min to deploy so it'll be a while
damn, why?
by the way, is there a way to set usage limits even on a usage-based plan?
you pip freeze your entire systems python packages into the requirements.txt file?
I'm trying to deploy a simple AI model and I suspect it's a combination of the weights as well as installing pytorch and torchvision
chonky
click==7.1.2
Flask==1.1.2
gunicorn==20.0.4
itsdangerous==1.1.0
Jinja2==2.11.3
MarkupSafe==1.1.1
Werkzeug==1.0.1
matplotlib>=3.2.2
numpy>=1.18.5,<1.24.0
Pillow>=7.1.2
PyYAML>=5.3.1
requests>=2.23.0
scipy>=1.4.1
torch>=1.7.0,!=1.12.0
torchvision>=0.8.1,!=0.13.0
tqdm>=4.41.0
protobuf<4.21.3
opencv-python-headless==4.7.0.72
yeahhh
oh wow that's not even big
can you get pre-compiled versions of py torch?
there's not, it's on the roadmap though
perfect, thanks!
I'll look into that once this works, thanks for the tip
Build time: 154.82 seconds
build time itself isn't too long, just 2.5min
build time itself isn't too long, just 2.5min
damn, I have build times around 20 seconds
but I'm not doing ai stuff, so it's not comparable
yeahh gigantic libraries are the bane of my existence
finaly docker image is 4.9GB
holy shit, right now in #🎤|chit-chat we are talking about image sizes
and have come to the concussion that anything above 100mb is gigantic
.-.
chonk
you indeed have a chonker
anyways it works now, thanks! getting a different error but seems to be more code/dependency related rather than memory
maybe I might know how to help with the error, maybe not, let's see the error?
it's 'Upsample' object has no attribute 'recompute_scale_factor'``
oh yeah code issue 🙂
was thinking it was more along the lines of a missing system library
okay you are all set, happy coding
thank you!