Supabase•3y ago

Massive CPU Spikes

Hey everyone. We are experiencing massive CPU spikes for the past 24hours and our queries became suuuuper slow. It always spikes up to 100% before crashing. Since this is a production system, we need immediate help. Already tried a ton of things (optimising queries, checking for loops in trigger etc) but none has helped enough... As of now, I am not even sure if the slow queries (20+ seconds) are symptoms or cause of the CPU spike. It almost feels like an infinite loop is increasing the CPU load until it crashes, but I cannot see anything that hints to it. We also already upgraded to the 200/month 4CPU instance but the behaviour is the same - it goes up to 100% and crashes. Any help is highly appreciated since we are out of ideas as of now...

17 Replies

hejsfj•3y ago

@Thor - ElevenLabs anyone from your team who could help us with this? This is pretty urgent, as we have paying customers on our application 🥶. Any help would be greatly appriciated!

garyaustin•3y ago

I assume you both have sent in your support requests, that is the best move.

Thor - ElevenLabs•3y ago

@hejsfj the team is looking at this now and will get back to you here or on the ticket.

psteinroeOP•3y ago

Thanks!! It would already help a lot to know whether the slow queries are cause or symptom of the spikes. In the meanwhile, we make improvements to the query performances.

steve-chavez•3y ago

@psteinroe Note that you can find the expensive queries by querying pg_stat_statements. All the queries coming from supabase-js use the anon/authenticated roles.

select *
from pg_stat_statements s
join pg_authid a on s.userid = a.oid where rolname in ('authenticated', 'anon')
order by mean_exec_time desc;

select *
from pg_stat_statements s
join pg_authid a on s.userid = a.oid where rolname in ('authenticated', 'anon')
order by mean_exec_time desc;

psteinroeOP•3y ago

Thanks for the support everyone! We did a throughout analysis of all queries and could bring the query time back to normal and the cpu usage to <5%. Thanks for all the support and hints where to look at! I guess I will be working on a benchmarking tool using the new .explain() endpoint... 😄 does anyone know of a nice query visualiser similar to https://explain.dalibo.com that can be started locally or event generates a picture?

explain.dalibo.com

Visualizing and understanding your PostgreSQL execution plans made easy.

hejsfj•3y ago

@Thor - ElevenLabs the issue popped up again and we are again experiencing massive CPU usage, which leads to a non responsive system and outages. It there someone from your team, who would be available for a chat? @psteinroe maybe you can add some more context ☺️

psteinroeOP•3y ago

All queries are optimised, I am beginning to think that this the problem is not in user-land. looking at more queries now

psteinroeOP•3y ago

We are not at a CPU usage of -17867%... 😄

pcnc•3y ago

@hejsfj unfortunately the culprit here (at least partially) might've been the PostgREST version being rolled back to v9 after instance upgrade (as v10 isn't officially supported). This has now been resolved, as a result of the support ticket conversation @psteinroe I've seen your ticket regarding pg_net permission issues -would you mind describing your workflow around http_request_queue table and running inserts against it?

psteinroeOP•3y ago

Its now fixed by this issue comment: https://github.com/supabase/supabase/issues/4883#issuecomment-1073047443 Nevertheless, I think the permissions should be setup properly after an upgrade without any additional setup being required.

GitHub

Permissions denied for table using API but works with SupabaseClien...

Bug report Describe the bug When trying to access data from some API endpoints, like this: /rest/v1/leads?select=* I get this error: { "message": "permission denied for t...

psteinroeOP•3y ago

@pcnc We use pg_net to make requests to rest apis (node) from the db. For example, when a user inserts a message, after the insert, we request the submit-message REST api using the auth of the user that inserted the message into the db. Hence, we are still using the authenticated role when executing the http_post function of pg_net.

pcnc•3y ago

I agree. We are working on something to make upgrade operations easier for the user to manage themselves - but since we don't usually support upgrades of this nature, there is a chance of issues similar to this popping up. We'll make sure to include that step in the future as well if need arises

psteinroeOP•3y ago

Is there a way to limit results to an interval in which the query was executed?

ItsCube•3y ago

i think its because of the CPU/RAM whats your system info ? you may need to limits the logs by making

ALTER SYSTEM SET log_statement TO 'none';
SELECT pg_reload_conf();

ALTER SYSTEM SET log_statement TO 'none';
SELECT pg_reload_conf();

https://github.com/supabase/supabase/issues/9833

psteinroeOP•3y ago

we are using the hosted offering and my current understanding is that the cpu spike is caused by slow queries. I now want to identify those queries. Unfortunately, the query provided by steve returns all slow queries. I need only the queries from the past X hours.

garyaustin•3y ago

You can at least reset the info period.... https://www.depesz.com/2020/12/21/waiting-for-postgresql-14-pg_stat_statements-track-time-at-which-all-statistics-were-last-reset/

depesz

select * from depesz;

Waiting for PostgreSQL 14 – pg_stat_statements: Track time at which...

Gaming

Programming

Massive CPU Spikes

Did you find this page helpful?