Railway•14mo ago

Can't seem to get the healthcheck to work, works fine locally over FASTapi and Flask

service ID: 2a262f8f-be17-475a-8463-21e12fafebbf I really hate opening this ticket guys, i'm sorry in advance but i can't seem to figure it out, i'm sure it's something small i must of missed im running a pretty big python repository with 7-10 seconds worth of healthchecks being done before returning status 200, however when i deploy on railway it just keeps timing out (the API itself worked fine with the current config, it's just the healthcheck endpoint which is acting up) for context i am running main.py from my procfile and my API is in another python file, both are being initialized though) Also the API has to run on port 4242 as its interacting with the stripe API via webhooks

171 Replies

Percy•14mo ago

Project ID: 2a262f8f-be17-475a-8463-21e12fafebbf

Floris•14mo ago

if anyone would have some sparetime and would maybe be willing to try and help me out i'd greatly appreciate it @brody192 what are my options to run multiple processed concurrently if procfiles are off the board, i dont really wanna subprocess into different py files w popen

Brody•14mo ago

you wanna go over that before we get the health check working?

Floris•14mo ago

that is the root problem of my health as i have a main file and i had a seperate api file to have my endpoints and well i was trying to init both of them seperate via the procfile hence the endpoint not working deploying via 2 services is not really an option as that would defeat the point of the healthcheck

Brody•14mo ago

interesting setup you have

Floris•14mo ago

yes

Brody•14mo ago

what does the main.py file do on its own?

Floris•14mo ago

its the main handler as its a mono repo the repo is like 8 or 9 k lines

Brody•14mo ago

how many services in this mono repo

Floris•14mo ago

i wire all thru 1 or well i did till the healthcheck had other ideas

Brody•14mo ago

how many different ports are in use

Floris•14mo ago

1 pre-set for stripe and 1 that railway assigns itself i believe?

Brody•14mo ago

I see, but unfortunately, per service you can only expose one port publicly, and ideally your app listens on $PORT

Floris•14mo ago

i see, i suppose i can route all over 1 port no? aslong as my endpoints are different its only internal traffic that goes over that api so its not a big deal

Brody•14mo ago

yep the running solution for services that listen on multiple ports is to use endpoints

Floris•14mo ago

its ok i only run stripe over 4242 and the other port would be health

Brody•14mo ago

ideally in the future you would be able to map internal ports to different domains on port 443 externally

Floris•14mo ago

i have 3 assigned domains

Floris•14mo ago

Brody•14mo ago

but the healthcheck does need to listen to $PORT since that check is made internally

Floris•14mo ago

Brody•14mo ago

yeah there's no native way to map those external domains to internal ports on your service, without running a proxy that does host matching

Floris•14mo ago

ahh okay okay thats a shame endpoints it is

Brody•14mo ago

if you want an example of that, I have one prepared just so you don't need to modify any of your code

Floris•14mo ago

yes sure if you want

Brody•14mo ago

okay one sec, let me find, since I whipped up an example for someone else that wanted to map internal ports to different subdomains on the same service https://discord.com/channels/713503345364697088/1154106744306421830/1154267922714345523 if you need any clarifications on anything I said in that thread just ask

Floris•14mo ago

quite impressive that you came up w that bro jeesus

Brody•14mo ago

everything I've learnt about railway is from being with the community

Floris•14mo ago

thats amazing, i never been in any coding communities but basicly the crux of what you say i cant have 4242 if i want the health path to work without a proxy server

Brody•14mo ago

yeah since your app listens on different ports

Floris•14mo ago

well guys that was 4 hours down the drain 💀 back 2 github actions i go thanks for the help i appreciate it

Brody•14mo ago

haha I was a sleep 4 hours ago, wish I could have gotten to help you sooner

Floris•14mo ago

imma try 1 sketchy thing

Brody•14mo ago

ouuu what ya got in mind

Floris•14mo ago

imma subprocess the bitch im sick of it via daphne

Brody•14mo ago

does what your going to subprocess need to access the same filesystem as the rest of the monorepo services?

Floris•14mo ago

im subprocessing the endpoint for the healthcheckm

Brody•14mo ago

because you could just run the 3 things separately in 3 different services

Floris•14mo ago

it would defeat the point

Brody•14mo ago

unless your healthcheck actually does more then just return 200

Floris•14mo ago

the whole idea of having the health check halter is to stop any faulty commits coming thru so if i have the same repo on multiple services one would always need to be out of sync w commits of the other thats not really practical i believe it does like 39 healthchecks internally or 37 im not sure

Brody•14mo ago

impressive

Floris•14mo ago

and that just returns 1 int if all pass yeah i mean we cant afford our main branch being down for some stupid reason

Brody•14mo ago

that's a whole lot more thorough than return 200 in a /health route

Floris•14mo ago

yes isnt that the point of it though or am i wrong hahaha woops

Floris•14mo ago

Brody•14mo ago

no you are definitely using a health check properly, though simply returning a fixed status code of 200 is still useful too not for your case, but you know

Floris•14mo ago

yeah but then i could just use the webhooks from railway no? for the deployment status change if thats all i want to query does a healthcheck have to be completed in between the retries?

Brody•14mo ago

for simple code bases it's often plenty to just return 200 so that railway knows when your new deployment is ready to start accepting requests, giving you less of a switch over period

Floris•14mo ago

ahhh i see i see this is our main backend repo so 80% of our code is here

Brody•14mo ago

railway will just retry your healthcheck endpoint for up to 5 minutes

Floris•14mo ago

hence why it big yes i know but it does like 20 retries or something is there a time limit how long the check can last in between the retries thats what i wonder

Brody•14mo ago

and if it never gets a 200, it will never switch in your new deployment

Floris•14mo ago

yes i like that

Brody•14mo ago

oh I see what you mean

Floris•14mo ago

as github actions is slow

Brody•14mo ago

what is the timeout of the individual check

Floris•14mo ago

hcheck takes longer than 10 seconds i dont know thats what i wonder

Brody•14mo ago

that's a good question

Floris•14mo ago

i have tried courotoutinng everythin alrdy but i cant get below 7 seconds

Brody•14mo ago

I'm not 100% sure but I can't imagine it wouldn't wait for 10 seconds

Floris•14mo ago

yeah i mean u never know could be something small

Brody•14mo ago

what are your deployment logs looking like during the healthcheck attempts

Floris•14mo ago

attempt # 9392932 failed every few seconds

Brody•14mo ago

what does that correlate to in code? failed database connection or something?

Floris•14mo ago

i dont know its generated by railway it doesnt get past building if it doesnt pass health

Brody•14mo ago

yes it does actually

Floris•14mo ago

ah ok for me it doesnt

Brody•14mo ago

your deployment is still ran, there should be deployment logs since your deployment needs to be running for any type of healthcheck to work

Floris•14mo ago

oh i see wtf i didnt see that up until now

Floris•14mo ago

build

Brody•14mo ago

click deploy logs

Floris•14mo ago

its clearly returning 200

Brody•14mo ago

well isn't that odd

Floris•14mo ago

but railway doesnt think so

Floris•14mo ago

thats my fastapi get

Floris•14mo ago

serving a non-200 before its done with the couroutine then serves 200 as u see here

Brody•14mo ago

you run the healthcheck on every request to that endpoint?

Floris•14mo ago

technically yeah but its awaiting

Brody•14mo ago

okay now throw in an early return 200 just for fun, skip the actual health check, as I understand your situation you are not working on a live site right now so it doesn't matter if stuff crashes?

Floris•14mo ago

hahaha no it is live just not on railway so its ok

Brody•14mo ago

perfect, just as I thought

Floris•14mo ago

the whole idea of railway was to use it like a node with 2 more but it would need to fit in 1 service then

Brody•14mo ago

gotcha

Floris•14mo ago

it takes ages to build though give it 4 min

Brody•14mo ago

yep, let me know how that goes

Floris•14mo ago

also i really appreciate you helping me

Brody•14mo ago

is a 4 minute build normal?

Floris•14mo ago

i have been frustrated with this the entire day yeah bro

Brody•14mo ago

are you deploying with a dockerfile?

Floris•14mo ago

yes its dockerizing it automatically

Brody•14mo ago

haha well yeah, I was more so asking if you where bringing your own Dockerfile to the party

Floris•14mo ago

ah no its just a repo with raw .py

Brody•14mo ago

gotcha

Floris•14mo ago

it went thru the health now so that means there IS an inidivual dtime limit

Floris•14mo ago

while as u saw my long ass healthcheck DID return 200 eventually so thats not cool

Brody•14mo ago

okay so there is a solution to this how do I explain

Floris•14mo ago

celery all of m?

Brody•14mo ago

haha no, much simpler

Floris•14mo ago

oh WORD

Brody•14mo ago

instead of running your healthcheck on every request of that endpoint, only let health.healthcheck ever run once, so that you will see a few failed healthchecks in the deployment logs while your single healthcheck check runs then once your healthcheck finally finished updating the status code that route returns to reflect a failed or successful healthcheck

Floris•14mo ago

so something like this or

Floris•14mo ago

oh woops wait inversed

Brody•14mo ago

haha yeah

Floris•14mo ago

i havent slept yet so my bad that looks abour right no?

Brody•14mo ago

let me think this over

Floris•14mo ago

ok bro

Brody•14mo ago

nah I can see this running a health check every request, since the healthcheck value will be zero until the healthcheck returns

Floris•14mo ago

global cache it?

Brody•14mo ago

yeah just make sure you sync access to the global value

Floris•14mo ago

but it can only run once anyways no?

Floris•14mo ago

asyncio sleep it could be feasible too

Brody•14mo ago

you could also have a Boolean flag named healthcheck_in_progress, and on the first request to your endpoint set that true, then run the healthcheck in a thread and update the flag to false and return the correct status that way it's always an instant return of a non successful status code until the moment your app finished the background healthcheck, then the thread updates the status code the route returns and railway switches in your deployment

Floris•14mo ago

i havent worked with threads that much this is awaited / couroutined does that mtter? matter

Brody•14mo ago

nah you could make it work is health.healthcheck non blocking without the await?

Floris•14mo ago

u cant run it without it calls over a 100 functions and theres stripe payments being processed and well thats self explanatory

Brody•14mo ago

fair, then you'd need to await it in a separate thread to turn it in a healthcheck that runs in the background

Floris•14mo ago

coding is so wild sometimes what a great first project to do

Brody•14mo ago

haha so what is this project anyway and if you don't mind me asking (you don't have to answer) where do you currently have it deployed?

Floris•14mo ago

haaaa man you do not want to know so technically this is just the wrapper yes

Floris•14mo ago

Brody•14mo ago

GPUs

Floris•14mo ago

yezzor AI

Brody•14mo ago

it needs GPUs?

Floris•14mo ago

yes lots

Brody•14mo ago

are you aware railway doesn't offer GPU compute?

Floris•14mo ago

i have the GPUs myself this is the wrapper on railway

Brody•14mo ago

ohhhhh

Floris•14mo ago

just as a node remember

Brody•14mo ago

I see now very cool

Floris•14mo ago

yes we made 2 trading algoritms last year and they did really well over the year so now this is a step further or well we, back then i had developers and i traded and just instructed but now i learned how to dev myself

Brody•14mo ago

that's awesome!

Floris•14mo ago

brody my friend i owe you something what a brilliant idea

Floris•14mo ago

global_healthcheck = 0

@app.get("/health")
async def health_repo():
    global global_healthcheck
    
    status_code = 300

    if global_healthcheck == 0:    
        _ = await health.healthcheck()
        global_healthcheck += 1

    return {"status_code": status_code}

global_healthcheck = 0

@app.get("/health")
async def health_repo():
    global global_healthcheck
    
    status_code = 300

    if global_healthcheck == 0:    
        _ = await health.healthcheck()
        global_healthcheck += 1

    return {"status_code": status_code}

this was the solution thank u so much for ur time man it really means alot man can i buy u a coffee or something hahahahah fucking hell

Brody•14mo ago

looks good!

Floris•14mo ago

Brody•14mo ago

if you want to you can, but you absolutely don't have to

Floris•14mo ago

the commits of a dying man send me ur paypal ill send u a coffee bro

Brody•14mo ago

I hate uvicorn too, you should use hypercorn 🤣

Floris•14mo ago

bro anything with -corn im not touching it anymore holy fuck

Brody•14mo ago

that's why I do golang, no silly things needed to get a web server running in production I actually have a buymeacoffee, it's in my bio, but seriously you don't need to

Floris•14mo ago

i just saw that yea how coincidental real question though bro why do you NOT work for railway

Brody•14mo ago

not qualified

Floris•14mo ago

in what sense? it seems like you're doing a pretty good job

Floris•14mo ago

https://www.buymeacoffee.com/brody192/c/6966224

Buy Me a Coffee

Brody is Helping in the Railway Discord server!

I love supporting creators!

Floris•14mo ago

thanks again fuck me bro i refactored that entire file 6 times just for railway to time out in between retries 😭

Brody•14mo ago

haha I'll ask the team about increasing that time limit in the sense that I have no work experience

Floris•14mo ago

or atleast let them write it down somewhere because the docs werent really helpful im sure i wont be the only one with an almost 1k line healthchecker i see i see, how old are you?

Brody•14mo ago

good point, I will put it in the docs once I get more information I think 22

Floris•14mo ago

u think im 21

Brody•14mo ago

I stopped counting

Floris•14mo ago

whahwhahaw real me with my bug fix commits today

Brody•14mo ago

🤣

Floris•14mo ago

good lord

Brody•14mo ago

thanks for the train btw, means a lot

Floris•14mo ago

i mean bro you saved me some time thats for sure

Brody•14mo ago

haha maybe

Floris•14mo ago

alright bro i gotta go, imma be more active in this discord cus i learn alot cya

Brody•14mo ago

awesome, welcome to the server!!

Gaming

Programming

Can't seem to get the healthcheck to work, works fine locally over FASTapi and Flask