S
Supabaseβ€’3y ago
psteinroe

Massive CPU Spikes

Hey everyone. We are experiencing massive CPU spikes for the past 24hours and our queries became suuuuper slow. It always spikes up to 100% before crashing. Since this is a production system, we need immediate help. Already tried a ton of things (optimising queries, checking for loops in trigger etc) but none has helped enough... As of now, I am not even sure if the slow queries (20+ seconds) are symptoms or cause of the CPU spike. It almost feels like an infinite loop is increasing the CPU load until it crashes, but I cannot see anything that hints to it. We also already upgraded to the 200/month 4CPU instance but the behaviour is the same - it goes up to 100% and crashes. Any help is highly appreciated since we are out of ideas as of now...
No description
17 Replies
hejsfj
hejsfjβ€’3y ago
@Thor - ElevenLabs anyone from your team who could help us with this? This is pretty urgent, as we have paying customers on our application πŸ₯Ά. Any help would be greatly appriciated!
garyaustin
garyaustinβ€’3y ago
I assume you both have sent in your support requests, that is the best move.
Thor - ElevenLabs
Thor - ElevenLabsβ€’3y ago
@hejsfj the team is looking at this now and will get back to you here or on the ticket.
psteinroe
psteinroeOPβ€’3y ago
Thanks!! It would already help a lot to know whether the slow queries are cause or symptom of the spikes. In the meanwhile, we make improvements to the query performances.
steve-chavez
steve-chavezβ€’3y ago
@psteinroe Note that you can find the expensive queries by querying pg_stat_statements. All the queries coming from supabase-js use the anon/authenticated roles.
select *
from pg_stat_statements s
join pg_authid a on s.userid = a.oid where rolname in ('authenticated', 'anon')
order by mean_exec_time desc;
select *
from pg_stat_statements s
join pg_authid a on s.userid = a.oid where rolname in ('authenticated', 'anon')
order by mean_exec_time desc;
psteinroe
psteinroeOPβ€’3y ago
Thanks for the support everyone! We did a throughout analysis of all queries and could bring the query time back to normal and the cpu usage to <5%. Thanks for all the support and hints where to look at! I guess I will be working on a benchmarking tool using the new .explain() endpoint... πŸ˜„ does anyone know of a nice query visualiser similar to https://explain.dalibo.com that can be started locally or event generates a picture?
explain.dalibo.com
Visualizing and understanding your PostgreSQL execution plans made easy.
hejsfj
hejsfjβ€’3y ago
@Thor - ElevenLabs the issue popped up again and we are again experiencing massive CPU usage, which leads to a non responsive system and outages. It there someone from your team, who would be available for a chat? @psteinroe maybe you can add some more context ☺️
psteinroe
psteinroeOPβ€’3y ago
All queries are optimised, I am beginning to think that this the problem is not in user-land. looking at more queries now
psteinroe
psteinroeOPβ€’3y ago
We are not at a CPU usage of -17867%... πŸ˜„
No description
pcnc
pcncβ€’3y ago
@hejsfj unfortunately the culprit here (at least partially) might've been the PostgREST version being rolled back to v9 after instance upgrade (as v10 isn't officially supported). This has now been resolved, as a result of the support ticket conversation @psteinroe I've seen your ticket regarding pg_net permission issues -would you mind describing your workflow around http_request_queue table and running inserts against it?
psteinroe
psteinroeOPβ€’3y ago
Its now fixed by this issue comment: https://github.com/supabase/supabase/issues/4883#issuecomment-1073047443 Nevertheless, I think the permissions should be setup properly after an upgrade without any additional setup being required.
GitHub
Permissions denied for table using API but works with SupabaseClien...
Bug report Describe the bug When trying to access data from some API endpoints, like this: /rest/v1/leads?select=* I get this error: { &quot;message&quot;: &quot;permission denied for t...
psteinroe
psteinroeOPβ€’3y ago
@pcnc We use pg_net to make requests to rest apis (node) from the db. For example, when a user inserts a message, after the insert, we request the submit-message REST api using the auth of the user that inserted the message into the db. Hence, we are still using the authenticated role when executing the http_post function of pg_net.
pcnc
pcncβ€’3y ago
I agree. We are working on something to make upgrade operations easier for the user to manage themselves - but since we don't usually support upgrades of this nature, there is a chance of issues similar to this popping up. We'll make sure to include that step in the future as well if need arises
psteinroe
psteinroeOPβ€’3y ago
Is there a way to limit results to an interval in which the query was executed?
ItsCube
ItsCubeβ€’3y ago
i think its because of the CPU/RAM whats your system info ? you may need to limits the logs by making
ALTER SYSTEM SET log_statement TO 'none';
SELECT pg_reload_conf();
ALTER SYSTEM SET log_statement TO 'none';
SELECT pg_reload_conf();
https://github.com/supabase/supabase/issues/9833
psteinroe
psteinroeOPβ€’3y ago
we are using the hosted offering and my current understanding is that the cpu spike is caused by slow queries. I now want to identify those queries. Unfortunately, the query provided by steve returns all slow queries. I need only the queries from the past X hours.

Did you find this page helpful?