Crons just randomly failing to run
I'm seeing over the last few weeks by the looks of it that cron jobs are just randomly failing to run, I have 3 crons set to run every 15 min on the hour , 15, 30, 45. I'm seeing that some just ramdomly didnt run in any logs. These crons fire some edge functions ad I'm seeing the same pattern in the edge function logs. So it leads me to believe that its the cron just not being run. It was rock solid, could see in logs a "flat pattern" 12 an hour, now just randmoly up/ down

64 Replies
Is the job showing up in the cron logs? https://supabase.com/dashboard/project/_/logs/pgcron-logs
There is also the cron.job_run_details that should show every cron job when it runs.
Also more debugging: https://supabase.com/docs/guides/troubleshooting/pgcron-debugging-guide-n1KTaz
@silentworks @garyaustin the screenshot i put is from the logs
You don’t show any logs. Just some sort of usage report graph.
it might actually be thats its running the cron twice: which would actually make the stats look up/down, but still as bad to run them twice, when set at hour, 15, 30, 45 on each 3 jobs

@garyaustin its actually both, its missed 9:15 out completely, ran 10:30 twice, a double "bad"
No idea why it would miss or double, or miss and catchup?, other that the guide mentioning resources could cause a run issue.
You might see if there are any issues here: https://github.com/citusdata/pg_cron or on the web. Cron is not a supabase piece of code and is strictly that 3rd party extension and Postgres.
Not sure if support could help or not. Also check the cron.job_run_details and confirm it shows the same thing. Are these cron tasks using pg_net (if done with the cron UI they are) to call the edge function or http?
Not sure if support could help or not. Also check the cron.job_run_details and confirm it shows the same thing. Are these cron tasks using pg_net (if done with the cron UI they are) to call the edge function or http?
@garyaustin using the cron ui
Then even if the edge function did not respond there would be no hang up as it uses pg_net which does not wait for a response. So adds to the puzzle.
Did you check if the details table matches the log?
Also check the job table to make sure there are not extra cron jobs there...

I would think jobid can't be duplicated though, but have never checked...
@garyaustin no extra jobs in jobs table
Got nothing else. I've done a quick search here, github discussions and pg_cron repository and don't find any similar cases (within limits of search terms hitting).
unless the logs themsleves are wrong?
That is why I suggested checking the details table. That is a pg_cron table. The logs are all supabase things.
It also shows start time and end time which might show if something does not finish(?).
@garyaustin I've found this, any ideas why loads of these are timing out, all the stats in the edge functions etc,say they fired, so i got no realy clue what all these "timeouts" are but they are all happening at the times of the crons running, which fire the edge functions?

I suspect you set the timout for 5000 when you created the edge function cron task in the UI (not sure if you said if you used the UI). This timeout is passed to pg_net. I've never seen a great explanation of what can impact the timeout but it seems to be overloading pg_net with requests. Are you also doing webhooks on a highly insert/updated table? https://github.com/supabase/pg_net/issues/179#issue-2929010661
If you wrote your own call with http to the edge function it is taking longer than the timeout for http.
GitHub
Currently Does not Scale Well · Issue #179 · supabase/pg_net
Bug report Once pg_net reaches a certain number of jobs, jobs start to time out aggressively. This makes it unsuitable for any type of production workload where you may be dealing with a burstable ...
@garyaustin yes the timeout in the crons is set to 5000, its the max it will take. I do have webhooks, that fire if an "account" is updated at anytime, so as that 1 account is "auto updated" before the cron runs on the 0,15,30,45
Look in your net schema and see how often pgnet is being called.
where do i find that?
Table UI, select net schema...
There are two tables. One is results and one is launched requests.
the one i put above is the other one, firing at the same times as the crons, 0,15,30,45, 3 times on each "time", i'd hardly call that "extensive" "large use"?

Any thing in response?
If not you have not run in a while as the tables clear out in some short time frame.
responces, the one i put above, with all the timeouts
I thought that was a cron table.
That was net._http_response?
http_responces yes
I see that it is now.
So maybe timeout IS how long your edge function has to respond. As I said it is not clear.
If so and it takes longer than 5 seconds pg_net would error.
OK it appears that is what it is...
So your edge function would have to respond in 5 seconds.
If I read that right. It is still not 100% clear.
its clear as mud and if these "timeouts" are whats causing the problems, in stats etc looking like they are up/down, its basically completely "unfit for purpose"/ useless, if you can't fire 3 ! crons
Not sure what 3 crons would have to do with it IF the timeout is on the edge function.
ok, so a function that cant complete with 5s, that updates 10 rows, is equally as bad, should be 500ms to do that, less
Agree. You would have to look at if that is the case or not.
is this work off of "cloudflare" functions? cause all the header have cloudflare in them
No, edge functions are hosted by Supabase on AWS hardware and run deno. Don't usually run into issues for DB operations being slow. But if you are using the REST API and doing 10 updates versus 1 update of 10 rows, then latency could add up quick.
where do they fire from/ to then, how can i set fire from us-east to us-east?
They fire closest to the user.
By default
You can force them to a specific region.
its a cron, so where is the user lol
True.
Then should be near them.
Do the edge logs give you a feel for how long the functions are taking?
You could console.log in them to tell.
if the cron fires in tim buk to, to new zealand, there prob 4 s of latency
And I'm only speculating the 5000msec is the time to respond, it has always been fuzy on what the timout is at least to me for pg_net.
not really

Not very accurate, I've seen users report larger latency than this before. There are many things to consider when it comes to latency besides just location.
@silentworks even worse if others are reporting even higher, its almost a duff product, if you cant fire a function in less than 5 seconds
It's not about the function firing, its about what the function is doing and how long it takes to get a response.
Are you doing 10 separate REST requests in the function?
A simple
return Response('Hello World') will give an immediate response.its updating 20 rows in the db, about as small as it can possible get, how can/ could anyone use it on 20000 rows lol
20 separate updates or one update of 20 rows?
Not trying to make excuses for Supabase edge functions here, but 20 rows could contain any amount of data. If I'm updating 20 rows with a huge json payload it will take longer than 200 rows with just one column with a single word text.
Best thing to do is console.log the edge function start and return to see if this is even it.
at max a 300 line json
I'd suggest you follow Gary's suggestion and also try and call that edge function directly too from your local machine with the same payload and see how long it takes. If the edge function is not timing out then we narrow this down to the cron job itself.
The edge function would not be timing out necessarily. The pg_net call has a timeout of 5000msec max. I ASSUME that is for the request to respond but not seen that well defined. So if the edge function has not finished what it is doing in 5S then you get the timeout.
EDIT I'm not sure you showed cron tasks. I glanced up and it was my own post showing what to look for..... SIGH
Also. Your crons you showed up top do NOT seem to be calling edge functions. They just do SQL. That is why I thought you were posting a cron table.
Is this now a different issue with a webhook on update?
That should not be related to cron activity.
DNS:0.032116 Connect:0.053816 TLS:0.141154 TTFB:0.141516 Total:1.684647
is there some sort of "hotness" involved here in EF's. how long do they stay hot for, as the first local test was 4.2 sec, if i fire a few one after another under 2s
if they stay hot for say 5mins, i can try firing crons every 5 mins, see if it gets "back to stable"
I'm not sure how long they stay hot for but the do have a cold start on the first request to that function. So you could try making two requests a few minutes apart.
Persistent Storage and 97% Faster Cold Starts for Edge Functions
Mount S3-compatible buckets as persistent file storage in Edge Functions with up to 97% faster cold start times.

Also you can setup the updates in the function to be background tasks if they are long running https://supabase.com/docs/guides/functions/background-tasks which could be helpful in your use case.
so even after we fall pretty much is the "worst"
So...
If I recall the 5000 is only a cron UI limit.
I don't think pg_net has that limit. So a cron task written in SQL calling pg_net on its own could have longer...
so we are saying the cron ui is useless in effect, dont use it
GitHub
A maximum timeout of 5000ms when calling an edge function through t...
I am working for a company that uses Supabase as a backend and I am wondering where this limit originates. Is it originating from the pg_cron extension or from the pg_net extension? And how to poss...
my best guess here now is there is some madness around the "hotness/ startup" time, as soon as i put crons on every 10 mins, gets back to "stable"

webhooks seem fine and fire in this table on user updates