Coder.com•3mo ago

Issues updating from Coder 2.13.3 to 2.22.1

Hi guys, We're having issues updating coder, where the update never initiates, or fails somewhere without logs. In our setup we're deploying coder in our aws eks cluster via the existing helm chart. We tried updating our deployments in different staging environments and it was successful with no issues, but while doing it in our production something unexpected happens. We're not running the HA mode, so we have 1 pod running (the active pod) and when we perform the helmfile update command, a new pod spawns (the update pod). The update pod seems to be stuck in a running state, where it constantly keeps restarting because of the liveness probe. NO logs are shown in the update pod, other than the standard header and the web ui url (WARN: CODER_TELEMETRY is deprecated, please use CODER_TELEMETRY_ENABLE instead. Started HTTP listener at http://0.0.0.0:8080). The active pod also becomes unreachable (you cannot reach it via the web ui). We also tried scaling the deployment to 0 replicas, and then performing the update, but success. The same issue happens. After we rollback the helm release, everything goes back to normal. Thank you in advance!

4 Replies

Codercord•3mo ago

<#1415409362855268372>

Category

Help needed

Product

Coder (v2)

Platform

Linux

Logs

Please post any relevant logs/error messages.

matifali•3mo ago

Although the upgrade should be fine. But given it's a huge bump. I would advise upgrading gradually and check for any breaking changes for each release Also any logs can help

jesusOP•3mo ago

The thing is there were no logs, the only bit of information we could gather was when describing the update pod (in the scenario where i scale down the original deployment to 0 replicas, then performing the update). And the events were just about the liveness and readiness probes failing. Which we thought was weird, because it seems that the coder application didn't even start. To our surprise, when manually bumping up the liveness probe of the deployment to a higher timeout, it seems to have fixed the issue, and the update succeeded. It seems like the probe frequency was faster than the database migration itself due to some internal state. Is this possible? We checked the database and there were no connection on it.. This specifically is weird because the length of the database migration was pretty non-deterministic

Gaming

Programming

Issues updating from Coder 2.13.3 to 2.22.1

Did you find this page helpful?