Compute Exhausted Before Official Launch

Hi Neon Team, We have built an application that has two initial customers signed up. For now, there is only development traffic and some test-traffic to the website (it's a next.js app). I just got a notification that we surpassed the 50 hours included in the free plan. This happened before we had any real customer traffic. As such I am wondering if we are doing something wrong, and how all the compute could be consumed, only 12 days into the month, without having any material usage (real visitors) yet? Thank you in advance for the help. -- EDIT: I checked the "Monitoring > System operations" tab, to try to figure out when / how the compute was activated. I noticed it turned on several times throughout the night. In fact, if I start counting from midnight, I see 43 lines of compute activity up until 9am this morning (half is suspend, half is start). This pattern, of turning on and off (each time staying on for ~250ms), is happening every few minutes. No longer than 15 minutes is passing between every "start" event. I am certain that none of our two clients were up all night testing the app, and neither was I, so there is something really odd going on here. Unsure how to debug this. I also see exactly 901 connections (Max) every 15 minutes or so. It goes up to exactly 901, then drops to 0. In case useful: - We are storing the connection on globalThis in production, to avoid any unlikely reconnections. - We are not calling any regular ping to keep the database active. - We are using Drizzle, with @neondatabase/serverless version ^1.0.1.

20 Replies

continuing-cyan•2mo ago

Hey! Would you mind sending a screenshot of your CU over time vCPU over time chart? The 250ms you're seeing in the monitoring tab is not how long it was on for, but rather how it long it took for it to scale up

stormy-goldOP•2mo ago

Hey @Sam , Please find a couple of screen shots below.

stormy-goldOP•2mo ago

stormy-goldOP•2mo ago

This is the one I was talking about:

stormy-goldOP•2mo ago

It is starting again and again. We do not have users using it like that yet, so there is something not quite right here And through the night:

stormy-goldOP•2mo ago

I think I see part of the problem, in our own code, that might be contributing (but still not explaining the full picture). In short, we have a function called getOrganizations, which takes the first path segment (e.g. foo if the path was /foo), then does a drizzle call to neon to see if there is an organization entry in the database with foo as the primary key. That function is cached, to avoid that it contacts neon again when called with the same identifier (i.e. path argument). However, when we cannot find the organization in Noen, we return notFound(), which is a Next.js function that throws a special error that they catch internally and in response automatically send a 404 to the visitor. I see from our logs that we are being hit by quite a lot of requests for paths that do not exist in the database (just random spam bots looking for vulnerabilities, perhaps). Now, perhaps the return value of the function is not actually cached, when the function throws an error instead of returning normally. I could try some kind of early check in middleware, perhaps hitting a remote redis cache with org names, instead of hitting Neon, to check if the org exists (blocking those that don't). I can't use local memory or files, as we run across several container replicas and invalidation would not work if cached locally in one replica. The downside is that I would be slowing down middleware (any fetch in middleware feels like bad practice). On second thought, I will try to return null instead of throwing, to see if that helps. If I try this fix, could you perhaps make an exception and reset the compute, so I can test everything out. The above would anyways not explain the full picture, as seen from this sentry log trail:

stormy-goldOP•2mo ago

As seen, it is frequent, but not as frequent as the compute on/off

stormy-goldOP•2mo ago

Also, the timestamps do not coincide. We had several 404 between 4am and 6am, but looking at the Noen System operations list the compute was off between 4:01am and 6:07am. Hmmm

continuing-cyan•2mo ago

I'd start by changing your prod database min scale to 0.25vCPU, you'll get 4x the uptime, and your traffic can easily be served by it. That way, when compute wakes up it’s cheaper. At your scale I'm not sure adding a Redis cache is really necessary. If your org-id's have a standard form you could run string level validation in middleware using some regular expression to avoid hitting your DB on spammy calls.

stormy-goldOP•2mo ago

Hi @Sam , I have just turned down the compute to 0.25-1, as advised. Separately, look, it just keeps growing:

stormy-goldOP•2mo ago

It started the compute at 2am. That does not seem right. But I am over on the compute here, and I do not want our first two customer's testing to be slowed down if Noen throttles the traffic due to the limit being hit Neon* (Unfortunately, the ids are random strings, typically the company names slugified)

continuing-cyan•2mo ago

Can you prefix the page with org, so instead of having the path be example.com/<org-id>, it would be example.com/org/<org-id> and have the call to get getOrganizations in the page.tsx for /org. That way all these random bots won't spin up your database when they hit pages like sitemap or robots

stormy-goldOP•2mo ago

Hmmm, that's probably a good idea. I will look do some testing. Thank you for the tips/advice. Sorry to bother you with implementation details, but I just realized that the org path might not work. Clients would currently go to <someurl>, and we parse that URL to get the apex domain, then add that as the path. Meaning, if client Acme goes to their subdomain that points to our load balancer IP, e.g. platform.acme.com, then we send them to platform.acme.com/acme, which Next.js processes via dynamic routes (root path [identifier], where identifier then is acme). Adding /org, without accessing the database (or cache) in middleware, will not solve it, as we still have to redirect (or rewrite) all requests to /org/identifier, including those pesky spammers. As you see, it is a multi-tenant (white-labeled) solution. Does that make sense? Thanks a million for thoughts / tips. @Sam

continuing-cyan•2mo ago

I see, yeah that makes sense. If you keep a short regex/blacklist in middleware for the common bot paths you’ll cut out a lot of the bots before they hit your org lookup, so the DB won’t wake up unnecessarily Did switching from throwing notFound() inside getOrganization to returning null and then throwing one layer up help?

Gaming

Programming

Compute Exhausted Before Official Launch

Did you find this page helpful?