Can't seem to get the healthcheck to work, works fine locally over FASTapi and Flask
service ID: 2a262f8f-be17-475a-8463-21e12fafebbf
I really hate opening this ticket guys, i'm sorry in advance but i can't seem to figure it out, i'm sure it's something small i must of missed
im running a pretty big python repository with 7-10 seconds worth of healthchecks being done before returning status 200,
however when i deploy on railway it just keeps timing out
(the API itself worked fine with the current config, it's just the healthcheck endpoint which is acting up)
for context i am running main.py from my procfile and my API is in another python file, both are being initialized though)
Also the API has to run on port 4242 as its interacting with the stripe API via webhooks
171 Replies
Project ID:
2a262f8f-be17-475a-8463-21e12fafebbf
if anyone would have some sparetime and would maybe be willing to try and help me out i'd greatly appreciate it
@brody192 what are my options to run multiple processed concurrently if procfiles are off the board, i dont really wanna subprocess into different py files
w popen
you wanna go over that before we get the health check working?
that is the root problem of my health
as i have a main file
and i had a seperate api file to have my endpoints
and well i was trying to init both of them seperate via the procfile
hence the endpoint not working
deploying via 2 services is not really an option as that would defeat the point of the healthcheck
interesting setup you have
yes
what does the main.py file do on its own?
its the main handler as its a mono repo
the repo is like
8 or 9
k lines
how many services in this mono repo
i wire all thru 1
or well i did till the healthcheck had other ideas
how many different ports are in use
1 pre-set for stripe and 1 that railway assigns itself i believe?
I see, but unfortunately, per service you can only expose one port publicly, and ideally your app listens on $PORT
i see, i suppose i can route all over 1 port no? aslong as my endpoints are different
its only internal traffic that goes over that api
so its not a big deal
yep the running solution for services that listen on multiple ports is to use endpoints
its ok i only run stripe over 4242 and the other port would be health
ideally in the future you would be able to map internal ports to different domains on port 443 externally
i have 3 assigned domains
but the healthcheck does need to listen to $PORT since that check is made internally
yeah there's no native way to map those external domains to internal ports on your service, without running a proxy that does host matching
ahh
okay okay
thats a shame
endpoints it is
if you want an example of that, I have one prepared
just so you don't need to modify any of your code
yes sure
if you want
okay one sec, let me find, since I whipped up an example for someone else that wanted to map internal ports to different subdomains on the same service
https://discord.com/channels/713503345364697088/1154106744306421830/1154267922714345523
if you need any clarifications on anything I said in that thread just ask
quite impressive that you came up w that bro
jeesus
everything I've learnt about railway is from being with the community
thats amazing, i never been in any coding communities
but basicly the crux of what you say
i cant have 4242
if i want the health path to work
without a proxy server
yeah since your app listens on different ports
well guys
that was 4 hours
down the drain
š
back 2 github actions i go
thanks for the help i appreciate it
haha I was a sleep 4 hours ago, wish I could have gotten to help you sooner
imma try 1 sketchy thing
ouuu what ya got in mind
imma subprocess the bitch
im sick of it
via daphne
does what your going to subprocess need to access the same filesystem as the rest of the monorepo services?
im subprocessing the endpoint for the healthcheckm
because you could just run the 3 things separately in 3 different services
it would defeat the point
unless your healthcheck actually does more then just return 200
the whole idea of having the health check halter is to stop any faulty commits coming thru
so if i have the same repo on multiple services
one would always need to be out of sync w commits
of the other
thats not really practical
i believe it does like
39 healthchecks internally
or 37 im not sure
impressive
and that just returns 1 int if all pass
yeah i mean we cant afford our main branch being down for some stupid reason
that's a whole lot more thorough than
return 200
in a /health
routeyes
isnt that the point of it though
or am i wrong hahaha
woops
no you are definitely using a health check properly, though simply returning a fixed status code of 200 is still useful too
not for your case, but you know
yeah but then i could just use the webhooks from railway no?
for the deployment status change
if thats all i want to query
does a healthcheck have to be completed in between the retries?
for simple code bases it's often plenty to just return 200 so that railway knows when your new deployment is ready to start accepting requests, giving you less of a switch over period
ahhh
i see i see
this is our main backend repo so 80% of our code is here
railway will just retry your healthcheck endpoint for up to 5 minutes
hence why it big
yes i know but it does like
20 retries or something
is there a time limit how long the check can last in between the retries
thats what i wonder
and if it never gets a 200, it will never switch in your new deployment
yes i like that
oh I see what you mean
as github actions is slow
what is the timeout of the individual check
hcheck takes longer than 10 seconds
i dont know
thats what i wonder
that's a good question
i have tried courotoutinng
everythin alrdy
but i cant get below 7 seconds
I'm not 100% sure but I can't imagine it wouldn't wait for 10 seconds
yeah i mean u never know
could be something small
what are your deployment logs looking like during the healthcheck attempts
attempt # 9392932 failed
every few seconds
what does that correlate to in code? failed database connection or something?
i dont know its generated by railway
it doesnt get past building
if it doesnt pass health
yes it does actually
ah ok
for me it doesnt
your deployment is still ran, there should be deployment logs
since your deployment needs to be running for any type of healthcheck to work
oh i see
wtf
i didnt see that up until now
build
click deploy logs
its clearly returning 200
well isn't that odd
but railway doesnt think so
thats my fastapi get
serving a non-200 before its done with the couroutine
then serves 200
as u see here
you run the healthcheck on every request to that endpoint?
technically yeah
but its awaiting
okay now throw in an early
return 200
just for fun, skip the actual health check, as I understand your situation you are not working on a live site right now so it doesn't matter if stuff crashes?hahaha
no it is live
just not on railway
so its ok
perfect, just as I thought
the whole idea of railway was to use it like a node
with 2 more but it would need to fit in 1 service then
gotcha
it takes ages to build though
give it 4 min
yep, let me know how that goes
also i really appreciate you helping me
is a 4 minute build normal?
i have been frustrated with this the entire day
yeah bro
are you deploying with a dockerfile?
yes
its dockerizing it automatically
haha well yeah, I was more so asking if you where bringing your own Dockerfile to the party
ah no
its just a repo with raw .py
gotcha
it went thru the health now
so that means
there IS
an inidivual dtime limit
while as u saw my long ass healthcheck DID return 200 eventually
so thats not cool
okay so there is a solution to this
how do I explain
celery all of m?
haha no, much simpler
oh
WORD
instead of running your healthcheck on every request of that endpoint, only let health.healthcheck ever run once, so that you will see a few failed healthchecks in the deployment logs while your single healthcheck check runs
then once your healthcheck finally finished updating the status code that route returns to reflect a failed or successful healthcheck
so something like this
or
oh woops
wait inversed
haha yeah
i havent slept yet so my bad
that looks abour right no?
let me think this over
ok bro
nah I can see this running a health check every request, since the healthcheck value will be zero until the healthcheck returns
global cache it?
yeah just make sure you sync access to the global value
but it can only run once anyways no?
asyncio sleep it could be feasible too
you could also have a Boolean flag named healthcheck_in_progress, and on the first request to your endpoint set that true, then run the healthcheck in a thread and update the flag to false and return the correct status
that way it's always an instant return of a non successful status code until the moment your app finished the background healthcheck, then the thread updates the status code the route returns and railway switches in your deployment
i havent worked with threads that much
this is awaited / couroutined
does that mtter?
matter
nah you could make it work
is health.healthcheck non blocking without the await?
u cant run it without
it calls over a 100 functions
and theres stripe payments being processed
and well
thats self explanatory
fair, then you'd need to await it in a separate thread to turn it in a healthcheck that runs in the background
coding is so wild sometimes
what a great first project to do
haha so what is this project anyway
and if you don't mind me asking (you don't have to answer) where do you currently have it deployed?
haaaa man
you do not want to know
so technically this is just the wrapper yes
GPUs
yezzor
AI
it needs GPUs?
yes
lots
are you aware railway doesn't offer GPU compute?
i have the GPUs myself
this is the wrapper
on railway
ohhhhh
just as a node remember
I see now
very cool
yes
we made 2 trading algoritms last year and they did really well over the year
so now this is a step further
or well we, back then i had developers and i traded and just instructed
but now i learned how to dev myself
that's awesome!
brody
my friend
i owe you something
what a brilliant idea
this was the solution
thank u so much for ur time man it really means alot
man can i buy u a coffee or something hahahahah
fucking hell
looks good!
if you want to you can, but you absolutely don't have to
the commits of a dying man
send me ur paypal ill send u a coffee bro
I hate uvicorn too, you should use hypercorn š¤£
bro
anything with -corn
im not touching it anymore
holy fuck
that's why I do golang, no silly things needed to get a web server running in production
I actually have a buymeacoffee, it's in my bio, but seriously you don't need to
i just saw that yea how coincidental
real question though bro why do you NOT work for railway
not qualified
in what sense?
it seems like you're doing a pretty good job
thanks again
fuck me bro
i refactored
that entire file
6 times
just for railway to time out in between retries š
haha I'll ask the team about increasing that time limit
in the sense that I have no work experience
or atleast
let them write it down somewhere
because the docs werent really helpful
im sure i wont be the only one with an almost 1k line healthchecker
i see i see, how old are you?
good point, I will put it in the docs once I get more information
I think 22
u think
im 21
I stopped counting
whahwhahaw
real
me with my bug fix commits today
š¤£
good lord
thanks for the train btw, means a lot
i mean bro you saved me some time thats for sure
haha maybe
alright bro i gotta go, imma be more active in this discord cus i learn alot
cya
awesome, welcome to the server!!