R
Railway•4mo ago
Tim

Django with Gunicorn gives frequent [CRITICAL] WORKER TIMEOUTs on simple requests

I have been trying to deploy an app (no real users on this deploy yet). The deployment works, but some site requests that barely require any amount of back-end processing time (i.e. retrieve 1 model in the django admin without any calculated fields) will lead to Gunicorn timeouts. After a few seconds it will often work again, keep working for a while, and eventually go back to timeout's. Suggestions are very much appreciated. I have 2 gunicorn workers and 2 replica's and I am the only user.
[2024-02-05 12:21:37 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:27)
[2024-02-05 12:21:38 +0000] [16] [ERROR] Worker (pid:27) exited with code 1
[2024-02-05 12:21:38 +0000] [16] [ERROR] Worker (pid:27) exited with code 1.
[2024-02-05 12:21:39 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:37)
[2024-02-05 12:21:39 +0000] [16] [ERROR] Worker (pid:37) exited with code 1
[2024-02-05 12:21:39 +0000] [16] [ERROR] Worker (pid:37) exited with code 1.
[2024-02-05 12:21:37 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:27)
[2024-02-05 12:21:38 +0000] [16] [ERROR] Worker (pid:27) exited with code 1
[2024-02-05 12:21:38 +0000] [16] [ERROR] Worker (pid:27) exited with code 1.
[2024-02-05 12:21:39 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:37)
[2024-02-05 12:21:39 +0000] [16] [ERROR] Worker (pid:37) exited with code 1
[2024-02-05 12:21:39 +0000] [16] [ERROR] Worker (pid:37) exited with code 1.
I am running it using docker with the following dockerfile
FROM python:3.11-slim
RUN pip install --upgrade pip
COPY ./requirements.txt .
RUN pip install -r requirements.txt
COPY . /app
WORKDIR /app
COPY docker_entrypoint.sh .
ENTRYPOINT ["sh", "/app/docker_entrypoint.sh"]
FROM python:3.11-slim
RUN pip install --upgrade pip
COPY ./requirements.txt .
RUN pip install -r requirements.txt
COPY . /app
WORKDIR /app
COPY docker_entrypoint.sh .
ENTRYPOINT ["sh", "/app/docker_entrypoint.sh"]
Where the entrypoint will do migrations and eventually run
PYTHONPATH=`pwd`/project gunicorn project.wsgi.wsgi_production:application --timeout 60 --workers 2 --access-logfile - --log-level WARNING
PYTHONPATH=`pwd`/project gunicorn project.wsgi.wsgi_production:application --timeout 60 --workers 2 --access-logfile - --log-level WARNING
21 Replies
Percy
Percy•4mo ago
Project ID: 45fb0a1b-efd2-4699-81c8-16a885d8c33c
Tim
Tim•4mo ago
45fb0a1b-efd2-4699-81c8-16a885d8c33c Update: I now also have requests hanging for 5 minutes before a definitive timeout.
[2024-02-05 14:56:50 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:33)
[2024-02-05 14:56:50 +0000] [16] [ERROR] Worker (pid:33) exited with code 1
[2024-02-05 14:56:50 +0000] [16] [ERROR] Worker (pid:33) exited with code 1.
[2024-02-05 14:57:52 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:35)
[2024-02-05 14:57:52 +0000] [16] [ERROR] Worker (pid:35) exited with code 1
[2024-02-05 14:57:52 +0000] [16] [ERROR] Worker (pid:35) exited with code 1.
[2024-02-05 14:58:53 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:37)
[2024-02-05 14:58:53 +0000] [16] [ERROR] Worker (pid:37) exited with code 1
[2024-02-05 14:58:53 +0000] [16] [ERROR] Worker (pid:37) exited with code 1.
[2024-02-05 14:59:54 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:39)
[2024-02-05 14:59:55 +0000] [16] [ERROR] Worker (pid:39) exited with code 1
[2024-02-05 14:59:55 +0000] [16] [ERROR] Worker (pid:39) exited with code 1.
[2024-02-05 14:56:50 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:33)
[2024-02-05 14:56:50 +0000] [16] [ERROR] Worker (pid:33) exited with code 1
[2024-02-05 14:56:50 +0000] [16] [ERROR] Worker (pid:33) exited with code 1.
[2024-02-05 14:57:52 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:35)
[2024-02-05 14:57:52 +0000] [16] [ERROR] Worker (pid:35) exited with code 1
[2024-02-05 14:57:52 +0000] [16] [ERROR] Worker (pid:35) exited with code 1.
[2024-02-05 14:58:53 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:37)
[2024-02-05 14:58:53 +0000] [16] [ERROR] Worker (pid:37) exited with code 1
[2024-02-05 14:58:53 +0000] [16] [ERROR] Worker (pid:37) exited with code 1.
[2024-02-05 14:59:54 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:39)
[2024-02-05 14:59:55 +0000] [16] [ERROR] Worker (pid:39) exited with code 1
[2024-02-05 14:59:55 +0000] [16] [ERROR] Worker (pid:39) exited with code 1.
Brody
Brody•4mo ago
had another user with the exact same problem on the same tech stack, their issue turned out to be incorrect database credentials, the database connection was hanging up then silently failing blocking all requests while doing so
Tim
Tim•4mo ago
I am using
DEFAULT_DB_URL=${{Postgres.DATABASE_PRIVATE_URL}}
DEFAULT_DB_URL=${{Postgres.DATABASE_PRIVATE_URL}}
, and this can't explain why it does work sometimes right?
Brody
Brody•4mo ago
is that the environment variable you are using in code? because unless you are using a url database module django only accepts separate database credentials show me the database stuff in your settings.py please
Tim
Tim•4mo ago
I am using django-environ
DATABASES = {
"default": env.db("DEFAULT_DB_URL"),
}
DATABASES = {
"default": env.db("DEFAULT_DB_URL"),
}
migrating the database, loading the fixtures, etc. works (I have a sleep 2 on startup to ensure database connection is ready, as recommended in another post). And I can see the data in my admin panel if it does not time out.
Brody
Brody•4mo ago
sleep 3 is recommended, 2 seconds is pushing it because the max time for readiness does tend to exceed 2 seconds
Tim
Tim•4mo ago
I increased it to 5 to be on the safe side, but this won't resolve my timeouts 😅
Brody
Brody•4mo ago
theres some other piece of code somewhere that's blocking, I'd recommend adding verbose debug logging to find out at what point your app is locking up
Tim
Tim•4mo ago
Ok, will do that. Do you mean in Gunicorn or in Django or in my postgres service (or all of them) 🤔 ?
Brody
Brody•4mo ago
in django
Tim
Tim•4mo ago
I added debug logging, but I don't see anything in my logs when this happens.
[2024-02-08 18:58:05 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:17)
[2024-02-08 19:58:05 +0100] [17] [INFO] Worker exiting (pid: 17)
[2024-02-08 18:58:06 +0000] [16] [ERROR] Worker (pid:17) exited with code 1
[2024-02-08 18:58:06 +0000] [16] [ERROR] Worker (pid:17) exited with code 1.
[2024-02-08 18:58:06 +0000] [19] [INFO] Booting worker with pid: 19
[2024-02-08 18:59:09 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:19)
[2024-02-08 19:59:09 +0100] [19] [INFO] Worker exiting (pid: 19)
[2024-02-08 18:59:09 +0000] [16] [ERROR] Worker (pid:19) exited with code 1
[2024-02-08 18:59:09 +0000] [16] [ERROR] Worker (pid:19) exited with code 1.
[2024-02-08 18:59:09 +0000] [21] [INFO] Booting worker with pid: 21
[2024-02-08 19:00:11 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:21)
[2024-02-08 20:00:11 +0100] [21] [INFO] Worker exiting (pid: 21)
[2024-02-08 19:00:11 +0000] [16] [ERROR] Worker (pid:21) exited with code 1
[2024-02-08 19:00:11 +0000] [16] [ERROR] Worker (pid:21) exited with code 1.
[2024-02-08 19:00:11 +0000] [23] [INFO] Booting worker with pid: 23
[2024-02-08 19:01:12 +0000] [16] [ERROR] Worker (pid:23) exited with code 1.
[2024-02-08 19:01:12 +0000] [25] [INFO] Booting worker with pid: 25
[2024-02-08 20:01:13 +0100] [25] [DEBUG] GET /super-admin/campaigns/campaign/
[2024-02-08 18:58:05 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:17)
[2024-02-08 19:58:05 +0100] [17] [INFO] Worker exiting (pid: 17)
[2024-02-08 18:58:06 +0000] [16] [ERROR] Worker (pid:17) exited with code 1
[2024-02-08 18:58:06 +0000] [16] [ERROR] Worker (pid:17) exited with code 1.
[2024-02-08 18:58:06 +0000] [19] [INFO] Booting worker with pid: 19
[2024-02-08 18:59:09 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:19)
[2024-02-08 19:59:09 +0100] [19] [INFO] Worker exiting (pid: 19)
[2024-02-08 18:59:09 +0000] [16] [ERROR] Worker (pid:19) exited with code 1
[2024-02-08 18:59:09 +0000] [16] [ERROR] Worker (pid:19) exited with code 1.
[2024-02-08 18:59:09 +0000] [21] [INFO] Booting worker with pid: 21
[2024-02-08 19:00:11 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:21)
[2024-02-08 20:00:11 +0100] [21] [INFO] Worker exiting (pid: 21)
[2024-02-08 19:00:11 +0000] [16] [ERROR] Worker (pid:21) exited with code 1
[2024-02-08 19:00:11 +0000] [16] [ERROR] Worker (pid:21) exited with code 1.
[2024-02-08 19:00:11 +0000] [23] [INFO] Booting worker with pid: 23
[2024-02-08 19:01:12 +0000] [16] [ERROR] Worker (pid:23) exited with code 1.
[2024-02-08 19:01:12 +0000] [25] [INFO] Booting worker with pid: 25
[2024-02-08 20:01:13 +0100] [25] [DEBUG] GET /super-admin/campaigns/campaign/
The get request only shows after it keeps hanging for a while
Brody
Brody•4mo ago
something in your code is freezing and causing the request to take longer than 30 seconds unless you have something that should take longer than 30 seconds?
Tim
Tim•4mo ago
It happens on arbitrary requests that do not take any significant amount of time (should not be even close to a second), and it does not happen locally nor on my PythonAnywhere hosted (test)server (that does not deploy with Docker) 🤔 .
Brody
Brody•4mo ago
railway runs your code as is, pythonanywhere is likely monkeypatching away some bugs in your code
Tim
Tim•4mo ago
I already put the gunicorn timeout on 60 to test if it eventually would finish (which it does not).
Brody
Brody•4mo ago
without an error or any logs to go off of theres not much i can help you with here unfortunately
MIGHTY_MIDHUN
MIGHTY_MIDHUN•3mo ago
did u find any fix to this issue since im also facing a similar issue
Tim
Tim•3mo ago
I did not, based on my logging it seems to be hanging on loading static files, which I am serving through whitenoise (6.6.0)
INSTALLED_APPS = [
...
'whitenoise.runserver_nostatic',
"django.contrib.staticfiles",
...
]

MIDDLEWARE = [
"django.middleware.security.SecurityMiddleware",
"whitenoise.middleware.WhiteNoiseMiddleware",
...
]

STORAGES = {
"default": {
"BACKEND": "storages.backends.gcloud.GoogleCloudStorage",
},
"staticfiles": {
"BACKEND": "whitenoise.storage.CompressedManifestStaticFilesStorage",
},
}
INSTALLED_APPS = [
...
'whitenoise.runserver_nostatic',
"django.contrib.staticfiles",
...
]

MIDDLEWARE = [
"django.middleware.security.SecurityMiddleware",
"whitenoise.middleware.WhiteNoiseMiddleware",
...
]

STORAGES = {
"default": {
"BACKEND": "storages.backends.gcloud.GoogleCloudStorage",
},
"staticfiles": {
"BACKEND": "whitenoise.storage.CompressedManifestStaticFilesStorage",
},
}
Brody
Brody•3mo ago
please reference the docs for whitenoise on how to properly configure it
Tim
Tim•3mo ago
@MIGHTY_MIDHUN I solved it eventually by moving my static files to a Google Cloud Storage bucket.