Topics

Supabase•4w ago

Crons just randomly failing to run

I'm seeing over the last few weeks by the looks of it that cron jobs are just randomly failing to run, I have 3 crons set to run every 15 min on the hour , 15, 30, 45. I'm seeing that some just ramdomly didnt run in any logs. These crons fire some edge functions ad I'm seeing the same pattern in the edge function logs. So it leads me to believe that its the cron just not being run. It was rock solid, could see in logs a "flat pattern" 12 an hour, now just randmoly up/ down

No description

64 Replies

silentworks•4w ago

Is the job showing up in the cron logs? https://supabase.com/dashboard/project/_/logs/pgcron-logs

garyaustin•4w ago

There is also the cron.job_run_details that should show every cron job when it runs. Also more debugging: https://supabase.com/docs/guides/troubleshooting/pgcron-debugging-guide-n1KTaz

David.StantonOP•4w ago

@silentworks @garyaustin the screenshot i put is from the logs

garyaustin•4w ago

You don’t show any logs. Just some sort of usage report graph.

David.StantonOP•4w ago

it might actually be thats its running the cron twice: which would actually make the stats look up/down, but still as bad to run them twice, when set at hour, 15, 30, 45 on each 3 jobs

No description

David.StantonOP•4w ago

@garyaustin its actually both, its missed 9:15 out completely, ran 10:30 twice, a double "bad"

garyaustin•4w ago

No idea why it would miss or double, or miss and catchup?, other that the guide mentioning resources could cause a run issue. You might see if there are any issues here: https://github.com/citusdata/pg_cron or on the web. Cron is not a supabase piece of code and is strictly that 3rd party extension and Postgres.
Not sure if support could help or not. Also check the cron.job_run_details and confirm it shows the same thing. Are these cron tasks using pg_net (if done with the cron UI they are) to call the edge function or http?

David.StantonOP•4w ago

@garyaustin using the cron ui

garyaustin•4w ago

Then even if the edge function did not respond there would be no hang up as it uses pg_net which does not wait for a response. So adds to the puzzle.

garyaustin•4w ago

Did you check if the details table matches the log? Also check the job table to make sure there are not extra cron jobs there...

No description

garyaustin•4w ago

I would think jobid can't be duplicated though, but have never checked...

David.StantonOP•4w ago

@garyaustin no extra jobs in jobs table

garyaustin•4w ago

Got nothing else. I've done a quick search here, github discussions and pg_cron repository and don't find any similar cases (within limits of search terms hitting).

David.StantonOP•4w ago

unless the logs themsleves are wrong?

garyaustin•4w ago

That is why I suggested checking the details table. That is a pg_cron table. The logs are all supabase things. It also shows start time and end time which might show if something does not finish(?).

David.StantonOP•3w ago

@garyaustin I've found this, any ideas why loads of these are timing out, all the stats in the edge functions etc,say they fired, so i got no realy clue what all these "timeouts" are but they are all happening at the times of the crons running, which fire the edge functions?

No description

garyaustin•3w ago

I suspect you set the timout for 5000 when you created the edge function cron task in the UI (not sure if you said if you used the UI). This timeout is passed to pg_net. I've never seen a great explanation of what can impact the timeout but it seems to be overloading pg_net with requests. Are you also doing webhooks on a highly insert/updated table? https://github.com/supabase/pg_net/issues/179#issue-2929010661 If you wrote your own call with http to the edge function it is taking longer than the timeout for http.

GitHub

Currently Does not Scale Well · Issue #179 · supabase/pg_net

Bug report Once pg_net reaches a certain number of jobs, jobs start to time out aggressively. This makes it unsuitable for any type of production workload where you may be dealing with a burstable ...

David.StantonOP•3w ago

@garyaustin yes the timeout in the crons is set to 5000, its the max it will take. I do have webhooks, that fire if an "account" is updated at anytime, so as that 1 account is "auto updated" before the cron runs on the 0,15,30,45

garyaustin•3w ago

Look in your net schema and see how often pgnet is being called.

David.StantonOP•3w ago

where do i find that?

garyaustin•3w ago

Table UI, select net schema... There are two tables. One is results and one is launched requests.

David.StantonOP•3w ago

the one i put above is the other one, firing at the same times as the crons, 0,15,30,45, 3 times on each "time", i'd hardly call that "extensive" "large use"?

No description

garyaustin•3w ago

Any thing in response? If not you have not run in a while as the tables clear out in some short time frame.

David.StantonOP•3w ago

responces, the one i put above, with all the timeouts

garyaustin•3w ago

I thought that was a cron table. That was net._http_response?

David.StantonOP•3w ago

http_responces yes

garyaustin•3w ago

I see that it is now. So maybe timeout IS how long your edge function has to respond. As I said it is not clear. If so and it takes longer than 5 seconds pg_net would error.

garyaustin•3w ago

OK it appears that is what it is...

garyaustin•3w ago

So your edge function would have to respond in 5 seconds. If I read that right. It is still not 100% clear.

David.StantonOP•3w ago

its clear as mud and if these "timeouts" are whats causing the problems, in stats etc looking like they are up/down, its basically completely "unfit for purpose"/ useless, if you can't fire 3 ! crons

garyaustin•3w ago

Not sure what 3 crons would have to do with it IF the timeout is on the edge function.

David.StantonOP•3w ago

ok, so a function that cant complete with 5s, that updates 10 rows, is equally as bad, should be 500ms to do that, less

garyaustin•3w ago

Agree. You would have to look at if that is the case or not.

David.StantonOP•3w ago

is this work off of "cloudflare" functions? cause all the header have cloudflare in them

garyaustin•3w ago

No, edge functions are hosted by Supabase on AWS hardware and run deno. Don't usually run into issues for DB operations being slow. But if you are using the REST API and doing 10 updates versus 1 update of 10 rows, then latency could add up quick.

David.StantonOP•3w ago

where do they fire from/ to then, how can i set fire from us-east to us-east?

garyaustin•3w ago

They fire closest to the user. By default You can force them to a specific region.

David.StantonOP•3w ago

its a cron, so where is the user lol

garyaustin•3w ago

True. Then should be near them. Do the edge logs give you a feel for how long the functions are taking? You could console.log in them to tell.

David.StantonOP•3w ago

if the cron fires in tim buk to, to new zealand, there prob 4 s of latency

garyaustin•3w ago

And I'm only speculating the 5000msec is the time to respond, it has always been fuzy on what the timout is at least to me for pg_net.

David.StantonOP•3w ago

not really

No description

silentworks•3w ago

Not very accurate, I've seen users report larger latency than this before. There are many things to consider when it comes to latency besides just location.

David.StantonOP•3w ago

@silentworks even worse if others are reporting even higher, its almost a duff product, if you cant fire a function in less than 5 seconds

silentworks•3w ago

It's not about the function firing, its about what the function is doing and how long it takes to get a response.

garyaustin•3w ago

Are you doing 10 separate REST requests in the function?

silentworks•3w ago

A simple return Response('Hello World') will give an immediate response.

David.StantonOP•3w ago

its updating 20 rows in the db, about as small as it can possible get, how can/ could anyone use it on 20000 rows lol

garyaustin•3w ago

20 separate updates or one update of 20 rows?

silentworks•3w ago

Not trying to make excuses for Supabase edge functions here, but 20 rows could contain any amount of data. If I'm updating 20 rows with a huge json payload it will take longer than 200 rows with just one column with a single word text.

garyaustin•3w ago

Best thing to do is console.log the edge function start and return to see if this is even it.

David.StantonOP•3w ago

at max a 300 line json

silentworks•3w ago

I'd suggest you follow Gary's suggestion and also try and call that edge function directly too from your local machine with the same payload and see how long it takes. If the edge function is not timing out then we narrow this down to the cron job itself.

garyaustin•3w ago

The edge function would not be timing out necessarily. The pg_net call has a timeout of 5000msec max. I ASSUME that is for the request to respond but not seen that well defined. So if the edge function has not finished what it is doing in 5S then you get the timeout. EDIT I'm not sure you showed cron tasks. I glanced up and it was my own post showing what to look for..... SIGH Also. Your crons you showed up top do NOT seem to be calling edge functions. They just do SQL. That is why I thought you were posting a cron table. Is this now a different issue with a webhook on update? That should not be related to cron activity.

David.StantonOP•3w ago

DNS:0.032116 Connect:0.053816 TLS:0.141154 TTFB:0.141516 Total:1.684647 is there some sort of "hotness" involved here in EF's. how long do they stay hot for, as the first local test was 4.2 sec, if i fire a few one after another under 2s if they stay hot for say 5mins, i can try firing crons every 5 mins, see if it gets "back to stable"

silentworks•3w ago

I'm not sure how long they stay hot for but the do have a cold start on the first request to that function. So you could try making two requests a few minutes apart.

garyaustin•3w ago

https://supabase.com/blog/persistent-storage-for-faster-edge-functions

Persistent Storage and 97% Faster Cold Starts for Edge Functions

Mount S3-compatible buckets as persistent file storage in Edge Functions with up to 97% faster cold start times.

No description

silentworks•3w ago

Also you can setup the updates in the function to be background tasks if they are long running https://supabase.com/docs/guides/functions/background-tasks which could be helpful in your use case.

David.StantonOP•3w ago

so even after we fall pretty much is the "worst"

garyaustin•3w ago

So... If I recall the 5000 is only a cron UI limit. I don't think pg_net has that limit. So a cron task written in SQL calling pg_net on its own could have longer...

David.StantonOP•3w ago

so we are saying the cron ui is useless in effect, dont use it

garyaustin•3w ago

https://github.com/orgs/supabase/discussions/37574

GitHub

A maximum timeout of 5000ms when calling an edge function through t...

I am working for a company that uses Supabase as a backend and I am wondering where this limit originates. Is it originating from the pg_cron extension or from the pg_net extension? And how to poss...

David.StantonOP•3w ago

my best guess here now is there is some madness around the "hotness/ startup" time, as soon as i put crons on every 10 mins, gets back to "stable"

No description

David.StantonOP•3w ago

webhooks seem fine and fire in this table on user updates

Supabase gives you the tools, documentation, and community that makes managing databases, authentication, and backend infrastructure a lot less overwhelming.

42KMembers

View on Discord

Did you find this page helpful?