Sleeping service not waking in response to POST on internal network

Hi All, Project ID: f54d62f5-02ea-4819-9882-dd03c6717a60 I have a public backend with FastAPI with hypercorn that handles login and some services. I am trying to spin out services into the private network for scalability and security. To test this, I have a private FastAPI service that hosts a language model.
There seem to be some problems with the interaction between sleep and internal networking.
When the service is not sleeping, if I send a POST request from the public API docs or from the front end, I get a 200 response as expected.
Once the service is sleeping, it does not wake up when it gets a POST request either from the public API or from the front end.
Logs show lines including:
[86] [ERROR] Error in ASGI Framework Connection timed out Max retries exceeded On the public API. No logs on the private service after sleeping.
Is this a known issue? Is there a potential workaround for this? And is Railway suited for this use case?
I can find hacky ways around this issue that are less efficient, but I would prefer to use the sleep and private networking features if at all possible.
Thanks in advance for any help with this.
Choo-choo.
Solution:
Thanks for the cc Brody. @Gambaru Consulting awake on private networking not working is a known issue. It's something I really want to fix though! The reason is because the private networking goes through a Wireguard tunnel and this prevents the current implementation from capturing the incoming traffic so we can hold on to it and start your application....
Jump to solution
27 Replies
Percy
Percy14mo ago
Project ID: f54d62f5-02ea-4819-9882-dd03c6717a60
Brody
Brody14mo ago
you are indeed correct, you can not wake a sleeping service over the private network I asked the team about this about 30 minutes after the sleep feature released a few days ago and will get back to you when I have an answer cc @Mig
Solution
Mig
Mig14mo ago
Thanks for the cc Brody. @Gambaru Consulting awake on private networking not working is a known issue. It's something I really want to fix though! The reason is because the private networking goes through a Wireguard tunnel and this prevents the current implementation from capturing the incoming traffic so we can hold on to it and start your application.
Mig
Mig14mo ago
I'm going to be working on the docs for app sleep this week to make issues like this more clear.
Brody
Brody14mo ago
soon ™️ got it 😉
Gambaru Consulting
Thanks Brody and Mig for confirming. Please let me know if there's anything I can do to help this issue get resolved faster.
BrianJM
BrianJM14mo ago
Are there known workarounds? I did not realize this until now (and it explains some things I observed). I have an idea. Place HAProxy layer between as a load balancer, with one public and one private domain as the backend, and set the load balancing so that the public domain is the backup. That would first attempt to use the private network, and fallback to public. Once the app wakes, the private network would be used. HAproxy uses almost no CPU and negligible RAM. This would probably cost around $1/mo, maybe less.
Mig
Mig14mo ago
hey @BrianJM, there aren't any workarounds really unless your client falls back to the non-private networking URL to wake. I think this isn't an acceptable workaround though. I've been discussing with the team on how to get app sleep working for private networking enabled workloads. It was shipped with that as a compromise since the feature will still be useful for most of our users but we can then add in the support for private networking. Sorry for the lack of information on when we launched it though. I think your idea could work with our load balancer (Envoy). I'll see how much we'd need to get that added and if it still makes sense for the private networking feature.
BrianJM
BrianJM14mo ago
Tha is for the feedback. I can probably test out the HAProxy idea next week and create a template (if viable). No need to apologize regarding the rollout information. I just noticed it didn't seem to wake (internal network) so I disabled it. I tried on a public app (I only use) and noticed the app does not sleep as often as I expected, and wakes unexpectedly. I think this may be due to bots scanning domains with LE certificates (this data is publicly accessible).
Gambaru Consulting
Hi All, just following up to see if there are any updates to this situation. If there are updates in the future, where can I expect to find them?
Mig
Mig13mo ago
@Gambaru Consulting There aren't any updates on this. As of writing we won't be able to resolve this issue yet but the team is 100% aware of it (not just me). I understand how significant this drawback to app sleep is but we haven't gotten enough feedback that this is impacting enough users to switch from other goals for this quarter. I have a potential workaround for you though. I could add support to prevent sleeping containers that only have private network traffic. The issue than is waking them up (the part preventing me from just fixing this now). The workaround is, to wake up your slept container you could add a tcp proxy and when your other application fails to reach the slept app over the internal domain, create a TCP connection to wake up it. I am fairly certain that would work. If you're interesting in that workaround let me know so I can get the first part added. cc @BrianJM
BrianJM
BrianJM13mo ago
@Mig If I understand correctly, this would work for me. I believe this is similar to my previous idea (I didn't test yet) with HAProxy load balancing internal as primary and external as backup (to wake it up).
pandas
pandas13mo ago
Could use waking up containers on private network as well It's pretty neat feature to have
Brody
Brody13mo ago
thats what this whole thread is about??
pandas
pandas13mo ago
"but we haven't gotten enough feedback that this is impacting enough users to switch from other goals for this quarter" feedback on that, sorry
Mig
Mig12mo ago
Thanks for commenting pandas. @BrianJM yeah, the solution is exactly pretty similar to your suggestion! So right now we actually almost (from my understanding) have private networking works with app sleep so I'm hesitant to suggest the workaround now. I can't give a date but I'm expecting within the new few weeks (< month). I can ping people here once I have more info on it !remind me to report app sleep privnet status in 1 week
Duchess
Duchess12mo ago
Got it, I will remind you to report app sleep privnet status at Mon, 20 Nov 2023 17:11:49 GMT
Gambaru Consulting
Hi Mig, thanks very much for getting back to me, and sorry for the radio silence. I was more or less trying to figure out what the timeline is at the moment so that I can plan around that. Since the plan right now is for next quarter, I'm focusing on some other issues first and deprioritizing scaling until there's a change. I'll continue watching this space for updates.
Duchess
Duchess12mo ago
Pepijn
Pepijn9mo ago
Any update on this?
Brody
Brody9mo ago
char can bonk me if i wasn't supposed to say anything, but wake via private network is reported to work with the new runtime thats currently in testing, though only with tcp traffic
Pepijn
Pepijn9mo ago
That would be great!
sixfalls
sixfalls7mo ago
any new info on this?
Brody
Brody7mo ago
char can bonk me again, the v2 runtime is pre-alpha right now
Duchess
Duchess7mo ago
New reply sent from Help Station thread:
any update? we really need to awake on private network
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody
Brody5mo ago
@Gambaru Consulting @BrianJM @pandas @Pepijn @sixfalls - I know this thread is old, but wake via private networking is now supported!
Duchess
Duchess3mo ago
New reply sent from Help Station thread:
Check out their recent changelog, runtime v2 is public alpha now, but it doesn't support app sleeping yet, when it does, it will support wake via private networking.Does only the app that will be sleeping need to be on runtime v2? Or does the app that will send a request to the sleeping app also need to be on runtime v2?
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
All services should be on V2.
You're seeing this because this thread has been automatically linked to the Help Station thread.
Want results from more Discord servers?
Add your server