Hi Gadget team, encountering a really weird issue, we have a Gadget app (kamera-express-app) with a staging environment. Our workflow is: we merge feature branches into develop, then merge develop into staging via PR on GitHub. We then deploy to the Gadget staging environment using ggt push --env=staging from CI.
We recently merged our develop branch into staging which includes all our custom actions, models, routes, and a POS extension. After deploying to the staging Gadget environment, all actions started returning 500 errors after exactly 10 seconds. The sandbox never provisions — there are zero js-sandbox-* log entries for any of the failing requests. The API gateway just times out waiting for a sandbox that never starts.
It worked briefly 30 minutes ago — logs showed a healthy workerID: 13 on js-sandbox-dev-node-22-15-8579798c48-7nmh9 running actions successfully. Then it stopped completely.
We've ruled out code issues as even a bare ping action with zero imports that just returns { ok: true } returns 500 (see attached image). The exact same codebase works fine on our development environment. Might be a sync issue but not to sure. We also tried debugging using the in built ai, but couldnt narrow down the issue
Failing trace IDs:
- 49ad4e9bc1799797d31764b7cba143b3
- 8dada1ff8ee3679ef4ec8fef3b425482
- f3bf831e55fe5f7540db6dcdef4d3bf4
Last working trace ID (~30 min ago):
- 0dab51c121b6de98df9c2de79e5b3ec0 (on reloaderProxyId: DZGnjYapYvRCVa5812JQb)
All requests now hit api-apps-shard-4-77954b78b8-* pods and time out with no sandbox logs. Any idea why the sandbox stopped provisioning on the staging environment?