Massive CPU Spikes
Hey everyone. We are experiencing massive CPU spikes for the past 24hours and our queries became suuuuper slow. It always spikes up to 100% before crashing. Since this is a production system, we need immediate help. Already tried a ton of things (optimising queries, checking for loops in trigger etc) but none has helped enough...
As of now, I am not even sure if the slow queries (20+ seconds) are symptoms or cause of the CPU spike. It almost feels like an infinite loop is increasing the CPU load until it crashes, but I cannot see anything that hints to it. We also already upgraded to the 200/month 4CPU instance but the behaviour is the same - it goes up to 100% and crashes.
Any help is highly appreciated since we are out of ideas as of now...

17 Replies
@Thor - ElevenLabs anyone from your team who could help us with this?
This is pretty urgent, as we have paying customers on our application π₯Ά. Any help would be greatly appriciated!
I assume you both have sent in your support requests, that is the best move.
@hejsfj the team is looking at this now and will get back to you here or on the ticket.
Thanks!! It would already help a lot to know whether the slow queries are cause or symptom of the spikes. In the meanwhile, we make improvements to the query performances.
@psteinroe Note that you can find the expensive queries by querying
pg_stat_statements
. All the queries coming from supabase-js use the anon/authenticated
roles.Thanks for the support everyone! We did a throughout analysis of all queries and could bring the query time back to normal and the cpu usage to <5%. Thanks for all the support and hints where to look at! I guess I will be working on a benchmarking tool using the new .explain() endpoint... π does anyone know of a nice query visualiser similar to https://explain.dalibo.com that can be started locally or event generates a picture?
explain.dalibo.com
Visualizing and understanding your PostgreSQL execution plans made easy.
@Thor - ElevenLabs the issue popped up again and we are again experiencing massive CPU usage, which leads to a non responsive system and outages.
It there someone from your team, who would be available for a chat?
@psteinroe maybe you can add some more context βΊοΈ
All queries are optimised, I am beginning to think that this the problem is not in user-land.
looking at more queries now
We are not at a CPU usage of -17867%... π

@hejsfj unfortunately the culprit here (at least partially) might've been the PostgREST version being rolled back to v9 after instance upgrade (as v10 isn't officially supported). This has now been resolved, as a result of the support ticket conversation
@psteinroe I've seen your ticket regarding pg_net permission issues -would you mind describing your workflow around http_request_queue table and running inserts against it?
Its now fixed by this issue comment: https://github.com/supabase/supabase/issues/4883#issuecomment-1073047443
Nevertheless, I think the permissions should be setup properly after an upgrade without any additional setup being required.
GitHub
Permissions denied for table using API but works with SupabaseClien...
Bug report Describe the bug When trying to access data from some API endpoints, like this: /rest/v1/leads?select=* I get this error: { "message": "permission denied for t...
@pcnc We use pg_net to make requests to rest apis (node) from the db. For example, when a user inserts a message, after the insert, we request the submit-message REST api using the auth of the user that inserted the message into the db. Hence, we are still using the authenticated role when executing the http_post function of pg_net.
I agree. We are working on something to make upgrade operations easier for the user to manage themselves - but since we don't usually support upgrades of this nature, there is a chance of issues similar to this popping up. We'll make sure to include that step in the future as well if need arises
Is there a way to limit results to an interval in which the query was executed?
i think its because of the CPU/RAM
whats your system info ?
you may need to limits the logs by making
https://github.com/supabase/supabase/issues/9833
we are using the hosted offering and my current understanding is that the cpu spike is caused by slow queries. I now want to identify those queries. Unfortunately, the query provided by steve returns all slow queries. I need only the queries from the past X hours.
You can at least reset the info period.... https://www.depesz.com/2020/12/21/waiting-for-postgresql-14-pg_stat_statements-track-time-at-which-all-statistics-were-last-reset/