R
Railwayβ€’5mo ago
Unsmart

Deployment removed before healthcheck

For some reason my old deploy is getting removed before the health check is actually completed for the new deployment :Hmmge: This is causing a breif period of downtime every deploy.
15 Replies
Percy
Percyβ€’5mo ago
Project ID: 8a562b1b-8488-472e-b420-02478d2a8df0
Unsmart
Unsmartβ€’5mo ago
8a562b1b-8488-472e-b420-02478d2a8df0
Brody
Brodyβ€’5mo ago
increase RAILWAY_DEPLOYMENT_OVERLAP_SECONDS to 35
Unsmart
Unsmartβ€’5mo ago
Do I just put that in env I assume? And is there a max value I can put? Might do a bit more than 35
Brody
Brodyβ€’5mo ago
as a service variable, start with 35 and increase from there, you likely can go up to something like 4 hours
Unsmart
Unsmartβ€’5mo ago
Yeah this isnt doing anything... the new build gets published at 22:57:53 and the old deploy is gone at 22:58:00. Only 7 seconds :Bruh: And the health check didnt succeed until 22:58:12 :sad: I thought if the health check didnt succeed it wouldnt promote the new build at all if you accidentally do something that doesnt build properly :Hmmge:
Brody
Brodyβ€’5mo ago
thats how it should work, yeah
Unsmart
Unsmartβ€’5mo ago
Hmm interesting I have managed to break the health check system somehow :LUL:
Brody
Brodyβ€’5mo ago
congratulations!
Unsmart
Unsmartβ€’5mo ago
any chance you'd be able to get someone from railway to look at why the healthcheck isnt working for me :Prayge:
No description
No description
No description
Duchess
Duchessβ€’5mo ago
Thread has been flagged to Railway team by @Brody.
Unsmart
Unsmartβ€’5mo ago
tyty πŸ˜„ Also the service in question is railway-cloudflared if needed feel free to restart whenever nothing important hosted there πŸ™‚
Melissa
Melissaβ€’5mo ago
innnteresting. this behavior is isolated to just this service? thanks for the screenshots, super helpful
Linear
Linearβ€’5mo ago
Issue PRO-1854 created.
PRO-1854 - Active service is stopped before healthcheck succeeds
Healthcheck is configured for a service and is being executed, however, you can see in the logs (see screenshots in the thread) that the active/healthy service is being stopped before a successful healthcheck, resulting in a period of downtime
Status
Triage
Product
Unsmart
Unsmartβ€’5mo ago
Yeah only this service, I have another service in that same project that the health check works fine on (railway-rust). Well I have found the issue and I guess its as expected. Was going to look into potentially trying to auto scale replicas for the service but noticed I cant make a replica and its because of volumes... says that it prevents multiple deployments thus the healthcheck is basically pointless lol. Sucks because this service only needs a single file that is read only :sad: