Crons just randomly failing to run

I'm seeing over the last few weeks by the looks of it that cron jobs are just randomly failing to run, I have 3 crons set to run every 15 min on the hour , 15, 30, 45. I'm seeing that some just ramdomly didnt run in any logs. These crons fire some edge functions ad I'm seeing the same pattern in the edge function logs. So it leads me to believe that its the cron just not being run. It was rock solid, could see in logs a "flat pattern" 12 an hour, now just randmoly up/ down
No description
64 Replies
silentworks
silentworks4w ago
Is the job showing up in the cron logs? https://supabase.com/dashboard/project/_/logs/pgcron-logs
garyaustin
garyaustin4w ago
There is also the cron.job_run_details that should show every cron job when it runs. Also more debugging: https://supabase.com/docs/guides/troubleshooting/pgcron-debugging-guide-n1KTaz
David.Stanton
David.StantonOP4w ago
@silentworks @garyaustin the screenshot i put is from the logs
garyaustin
garyaustin4w ago
You don’t show any logs. Just some sort of usage report graph.
David.Stanton
David.StantonOP4w ago
it might actually be thats its running the cron twice: which would actually make the stats look up/down, but still as bad to run them twice, when set at hour, 15, 30, 45 on each 3 jobs
No description
David.Stanton
David.StantonOP4w ago
@garyaustin its actually both, its missed 9:15 out completely, ran 10:30 twice, a double "bad"
garyaustin
garyaustin4w ago
No idea why it would miss or double, or miss and catchup?, other that the guide mentioning resources could cause a run issue. You might see if there are any issues here: https://github.com/citusdata/pg_cron or on the web. Cron is not a supabase piece of code and is strictly that 3rd party extension and Postgres.
Not sure if support could help or not. Also check the cron.job_run_details and confirm it shows the same thing. Are these cron tasks using pg_net (if done with the cron UI they are) to call the edge function or http?
David.Stanton
David.StantonOP4w ago
@garyaustin using the cron ui
garyaustin
garyaustin4w ago
Then even if the edge function did not respond there would be no hang up as it uses pg_net which does not wait for a response. So adds to the puzzle.
garyaustin
garyaustin4w ago
Did you check if the details table matches the log? Also check the job table to make sure there are not extra cron jobs there...
No description
garyaustin
garyaustin4w ago
I would think jobid can't be duplicated though, but have never checked...
David.Stanton
David.StantonOP4w ago
@garyaustin no extra jobs in jobs table
garyaustin
garyaustin4w ago
Got nothing else. I've done a quick search here, github discussions and pg_cron repository and don't find any similar cases (within limits of search terms hitting).
David.Stanton
David.StantonOP4w ago
unless the logs themsleves are wrong?
garyaustin
garyaustin4w ago
That is why I suggested checking the details table. That is a pg_cron table. The logs are all supabase things. It also shows start time and end time which might show if something does not finish(?).
David.Stanton
David.StantonOP3w ago
@garyaustin I've found this, any ideas why loads of these are timing out, all the stats in the edge functions etc,say they fired, so i got no realy clue what all these "timeouts" are but they are all happening at the times of the crons running, which fire the edge functions?
No description
garyaustin
garyaustin3w ago
I suspect you set the timout for 5000 when you created the edge function cron task in the UI (not sure if you said if you used the UI). This timeout is passed to pg_net. I've never seen a great explanation of what can impact the timeout but it seems to be overloading pg_net with requests. Are you also doing webhooks on a highly insert/updated table? https://github.com/supabase/pg_net/issues/179#issue-2929010661 If you wrote your own call with http to the edge function it is taking longer than the timeout for http.
GitHub
Currently Does not Scale Well · Issue #179 · supabase/pg_net
Bug report Once pg_net reaches a certain number of jobs, jobs start to time out aggressively. This makes it unsuitable for any type of production workload where you may be dealing with a burstable ...
David.Stanton
David.StantonOP3w ago
@garyaustin yes the timeout in the crons is set to 5000, its the max it will take. I do have webhooks, that fire if an "account" is updated at anytime, so as that 1 account is "auto updated" before the cron runs on the 0,15,30,45
garyaustin
garyaustin3w ago
Look in your net schema and see how often pgnet is being called.
David.Stanton
David.StantonOP3w ago
where do i find that?
garyaustin
garyaustin3w ago
Table UI, select net schema... There are two tables. One is results and one is launched requests.
David.Stanton
David.StantonOP3w ago
the one i put above is the other one, firing at the same times as the crons, 0,15,30,45, 3 times on each "time", i'd hardly call that "extensive" "large use"?
No description
garyaustin
garyaustin3w ago
Any thing in response? If not you have not run in a while as the tables clear out in some short time frame.
David.Stanton
David.StantonOP3w ago
responces, the one i put above, with all the timeouts
garyaustin
garyaustin3w ago
I thought that was a cron table. That was net._http_response?
David.Stanton
David.StantonOP3w ago
http_responces yes
garyaustin
garyaustin3w ago
I see that it is now. So maybe timeout IS how long your edge function has to respond. As I said it is not clear. If so and it takes longer than 5 seconds pg_net would error.
garyaustin
garyaustin3w ago
OK it appears that is what it is...
No description
garyaustin
garyaustin3w ago
So your edge function would have to respond in 5 seconds. If I read that right. It is still not 100% clear.
David.Stanton
David.StantonOP3w ago
its clear as mud and if these "timeouts" are whats causing the problems, in stats etc looking like they are up/down, its basically completely "unfit for purpose"/ useless, if you can't fire 3 ! crons
garyaustin
garyaustin3w ago
Not sure what 3 crons would have to do with it IF the timeout is on the edge function.
David.Stanton
David.StantonOP3w ago
ok, so a function that cant complete with 5s, that updates 10 rows, is equally as bad, should be 500ms to do that, less
garyaustin
garyaustin3w ago
Agree. You would have to look at if that is the case or not.
David.Stanton
David.StantonOP3w ago
is this work off of "cloudflare" functions? cause all the header have cloudflare in them
garyaustin
garyaustin3w ago
No, edge functions are hosted by Supabase on AWS hardware and run deno. Don't usually run into issues for DB operations being slow. But if you are using the REST API and doing 10 updates versus 1 update of 10 rows, then latency could add up quick.
David.Stanton
David.StantonOP3w ago
where do they fire from/ to then, how can i set fire from us-east to us-east?
garyaustin
garyaustin3w ago
They fire closest to the user. By default You can force them to a specific region.
David.Stanton
David.StantonOP3w ago
its a cron, so where is the user lol
garyaustin
garyaustin3w ago
True. Then should be near them. Do the edge logs give you a feel for how long the functions are taking? You could console.log in them to tell.
David.Stanton
David.StantonOP3w ago
if the cron fires in tim buk to, to new zealand, there prob 4 s of latency
garyaustin
garyaustin3w ago
And I'm only speculating the 5000msec is the time to respond, it has always been fuzy on what the timout is at least to me for pg_net.
David.Stanton
David.StantonOP3w ago
not really
No description
silentworks
silentworks3w ago
Not very accurate, I've seen users report larger latency than this before. There are many things to consider when it comes to latency besides just location.
David.Stanton
David.StantonOP3w ago
@silentworks even worse if others are reporting even higher, its almost a duff product, if you cant fire a function in less than 5 seconds
silentworks
silentworks3w ago
It's not about the function firing, its about what the function is doing and how long it takes to get a response.
garyaustin
garyaustin3w ago
Are you doing 10 separate REST requests in the function?
silentworks
silentworks3w ago
A simple return Response('Hello World') will give an immediate response.
David.Stanton
David.StantonOP3w ago
its updating 20 rows in the db, about as small as it can possible get, how can/ could anyone use it on 20000 rows lol
garyaustin
garyaustin3w ago
20 separate updates or one update of 20 rows?
silentworks
silentworks3w ago
Not trying to make excuses for Supabase edge functions here, but 20 rows could contain any amount of data. If I'm updating 20 rows with a huge json payload it will take longer than 200 rows with just one column with a single word text.
garyaustin
garyaustin3w ago
Best thing to do is console.log the edge function start and return to see if this is even it.
David.Stanton
David.StantonOP3w ago
at max a 300 line json
silentworks
silentworks3w ago
I'd suggest you follow Gary's suggestion and also try and call that edge function directly too from your local machine with the same payload and see how long it takes. If the edge function is not timing out then we narrow this down to the cron job itself.
garyaustin
garyaustin3w ago
The edge function would not be timing out necessarily. The pg_net call has a timeout of 5000msec max. I ASSUME that is for the request to respond but not seen that well defined. So if the edge function has not finished what it is doing in 5S then you get the timeout. EDIT I'm not sure you showed cron tasks. I glanced up and it was my own post showing what to look for..... SIGH Also. Your crons you showed up top do NOT seem to be calling edge functions. They just do SQL. That is why I thought you were posting a cron table. Is this now a different issue with a webhook on update? That should not be related to cron activity.
David.Stanton
David.StantonOP3w ago
DNS:0.032116 Connect:0.053816 TLS:0.141154 TTFB:0.141516 Total:1.684647 is there some sort of "hotness" involved here in EF's. how long do they stay hot for, as the first local test was 4.2 sec, if i fire a few one after another under 2s if they stay hot for say 5mins, i can try firing crons every 5 mins, see if it gets "back to stable"
silentworks
silentworks3w ago
I'm not sure how long they stay hot for but the do have a cold start on the first request to that function. So you could try making two requests a few minutes apart.
garyaustin
garyaustin3w ago
Persistent Storage and 97% Faster Cold Starts for Edge Functions
Mount S3-compatible buckets as persistent file storage in Edge Functions with up to 97% faster cold start times.
No description
silentworks
silentworks3w ago
Also you can setup the updates in the function to be background tasks if they are long running https://supabase.com/docs/guides/functions/background-tasks which could be helpful in your use case.
David.Stanton
David.StantonOP3w ago
so even after we fall pretty much is the "worst"
garyaustin
garyaustin3w ago
So... If I recall the 5000 is only a cron UI limit. I don't think pg_net has that limit. So a cron task written in SQL calling pg_net on its own could have longer...
David.Stanton
David.StantonOP3w ago
so we are saying the cron ui is useless in effect, dont use it
garyaustin
garyaustin3w ago
GitHub
A maximum timeout of 5000ms when calling an edge function through t...
I am working for a company that uses Supabase as a backend and I am wondering where this limit originates. Is it originating from the pg_cron extension or from the pg_net extension? And how to poss...
David.Stanton
David.StantonOP3w ago
my best guess here now is there is some madness around the "hotness/ startup" time, as soon as i put crons on every 10 mins, gets back to "stable"
No description
David.Stanton
David.StantonOP3w ago
webhooks seem fine and fire in this table on user updates

Did you find this page helpful?