massive Storage usage
Hello!
I currently try to migrate my production data to neon db. Therefor i Upgraded to launch-plan as this is more than enough for my app..
The raw data as csv is less than 1 GB, but when i Imported data via dbeaver it Shows me 77GB of usage and i now have to pay a lot of extra fee... how could that Happen? Is it a Bug?
Kind regards
Simon
63 Replies
national-gold•2y ago
Hey, sorry for the trouble. As a paying customer, you can open a support ticket. The support team will sort that out for you. You can access the support ticket form here:
https://console.neon.tech/app/projects?modal=support
rival-blackOP•2y ago
I downgraded the subscription to free as i Was afraid of even higher fees because even after dropping my database the storage consumption stayed at 77 GB and my bill grew from day to day... now i payed 15€ extra due to that Bug....
national-gold•2y ago
@yanic Can we look at Simon's account and see what happend?
@Yanic ^
rival-blackOP•2y ago
Yanic was already in contact with me but i got no answer about the question what will happen with the costs or if they could be freezed or something...
I also provided the csv files to yanic
Never the less i WOULD like to go back to launch plan... if this storage issue gets fixed..
national-gold•2y ago
Ok. Glad you got in touch with Support. They'll sort that out for you. Might be early next week, though. Thanks for your patience.
rival-blackOP•2y ago
Maybe just as an update...
as i said i downgraded to free plan, to not get massive charges for that storage issue.
Now i tried the import again via psql and copy from as described in your documentation.
i imported the same CSVs which have in sum less than 1 GB data... Now looking to my console it states again nearly 50GB of storage... which would massively exceed the 10GB of the launch-plan...
Please have a look into that issue, as otherwise neon-db is no option for me... even if i like all the other stuff and features very much
exotic-emerald•2y ago
I have the same issue in the storage
this my post
https://discord.com/channels/1176467419317940276/1226867335562924153
If u found a solution mention me plz
rival-blackOP•2y ago
@Daniel any Updates on this? Also about the invoice of 17$ for the 2 days i was on the launch plan? 😐
national-gold•2y ago
Hey Simon, Support will be better able to update you. Do you still have access the ticket you opened with Yanic? I would write him back there. If not, I'll find the ticket and post on your behalf.
rival-blackOP•2y ago
Yes i am in contact with yanic. And it seems that the storage is not just calculated regarding the raw data, but includes all indexes aswell... not sure if this is really how neon wants to calculate storage, but if so maybe you should point that clearly out
national-gold•2y ago
Thanks for the feedback, Simon. We'll improve that.
rival-blackOP•2y ago
Currently waiting for feedback since two days. Any updates on that issue?
national-gold•2y ago
Hey Simon, sorry for the delay. I know they are working through answering your questions and questions from other users about storage. A few things I have learned that I can share:
- The Storage on the Billing page currently displays peak storage rather than current storage. To see your current Storage size, check the Projects page.
- "Storage" is logical data size + WAL. So, you could have a small database, but if you keep modifying those records over and over again, your WAL will increase in size each time (WAL is a record of changes).
- The fastest way to reduce your Storage size is to decrease your retention period. On the Launch plan, I believe it is 7 days by default. At 7 days, you are maintaining a 7-day history of all inserts, updates, deletes for your data. If your database is really active, that can add up fast.
- To see the actual size of your data (without WAL), check the "Data size" value on Branches widget on your project dashboard (main page of Neon Console).
This is general info, not specific to your case.
I can't explain loading a 1GiB csv and ending up with 50GB of data. That certainly does not sound right. Support will have to dig in on that question.
rival-blackOP•2y ago
Thanks for your reply 😀
Does that mean storage cost is calculated regarding the peak usage and not the current usage? 🧐
My storage was 50gb even when setting retention to 0 days... this is even more weird 😅
Regarding the 17$ i was already charged, can you tell me sth about that?
national-gold•2y ago
Yeah, Support will have to look at the 50 GiB issue.
I am not sure about the specifics of your $17 bill, but regarding how "extra storage" works:
- Launch pan comes with 10GiB of storage.
- Extra storage is billed for in units of 2 GiB at $3.5 each, but that cost is prorated from the date of purchase (the date you went over). If you go over the last day of the month and end up purchasing 1 extra unit of storage, we don't charge the full $3.5 since you did use the extra storage for the whole month.
Here's a Scale plan example to illustrate ($69 a month for the plan, 50 GiB of data included in the base fee, extra storage is in 10 GiB units @ $15 each):
Example 1
1st June–27th June: up to 49 GiB
28th June: my usage went to 55 GiB
29th June: my usage went back down to 45 GiB til the end of the month
Then your bill would be:
Scale plan, 1 month = $69
10 GiB extra storage pack ($15/month) from 28th–30th June = $1.50
Total = $70.50
Note: There's an update coming to display current storage size on the Billing page instead of peak storage size.
rival-blackOP•2y ago
Short update:
It seems, that doing a data migration with indexes already being set in the database (which should always be the case in my opinion) makes a huge difference regarding the calculation of the used storage...
Doing the data migration with 1 GB of raw data and without indexes/foreignkeys being set, produced about 2GB of storage usage in the end. Creating the indexes after data migration was successful and did not increase storage usage.
But in my opinion this is a bit confusing and is not what you would expect.... and definitely not the common procedure when doing a migration, as you totally loose control about the consistency of your data. Of course, when setting indexes afterwards is successfull everything should be fine, but if not... you can search for the inconsistencies for a while 😅
Maybe there would be a "data migration mode" useful? So that you can do the migration without every Insert of the migration producing several WALs...
Because i guess you dont need the point in time recovery when initially inserting your data
national-gold•2y ago
That's an interesting idea. An option to disable WAL for an initial data load. Would speed up load, too. I've seen this on another RDBMS.
rival-blackOP•2y ago
Yes exactly
Another thought about that... of i have all indexes in place, but Set the retention period to 0 days before inserting data. That should work aswell, without producing so much data, right?
national-gold•2y ago
Let me ask our storage team ...
I have not heard back yet. I'll ping you when I do hear back.
rival-blackOP•2y ago
Did you already get some Information? 😀
national-gold•2y ago
Hi @Simon Höfle (simon1389) Not yet. I'll check again today.
Hi Simon, I tested your scenario of inserting data into a table with a defined index with and without a retention history.
I created two projects:
1. with_retention (retention history enabled at 1 day)
2. without_retention (retention history set to zero)
I created this table + index in each, and added data:
CREATE TABLE users (id serial, name text);
CREATE INDEX idx_user_id on users (id);
INSERT INTO users (name) SELECT 'alex' FROM generate_series(1,5000000);
Results:
with_retention project:
Data size (logical data size): 347.84 MiB
Storage: 1.17 GiB
without_retention project:
Data size (logical data size): 347.84 MiB
Storage size: 394.72 MiB
Disabling history retention in this scenario does use less storage. I assume it's because the index record creation is WAL logged, but I'll run the scenario again without the index to confirm.
rival-blackOP•2y ago
thank you daniel for investigation 🙂
national-gold•2y ago
As a follow up, I guess this insert won't use "bulk insertion":
INSERT INTO users (name) SELECT 'alex' FROM generate_series(1,5000000);
So, the size difference may not be as large for simple inserts. I'll be testing this out later.
national-gold•2y ago
I found much less difference when loading from CSV using psql with the COPY command. For the same CSV file - with history enabled: 459.27 MiB storage, without history enabled: 405.93 MiB storage. COPY loads all rows in one command instead of a series if INSERTS.
https://www.postgresql.org/docs/current/populate.html#POPULATE-COPY-FROM
PostgreSQL Documentation
14.4. Populating a Database
14.4. Populating a Database # 14.4.1. Disable Autocommit 14.4.2. Use COPY 14.4.3. Remove Indexes 14.4.4. Remove Foreign Key Constraints 14.4.5. Increase …
rival-blackOP•2y ago
Yes, thats how i did the import
So the main consumer if the storage on import, are the indexes... right?
national-gold•2y ago
I would say that's the reason, Simon (in the case where you've defined indexes before loading data).
correct-apricot•2y ago
Hi, so in order to import a db of roughly 10gb best idea is to use copy statements if possible and disable indexes + retention?
national-gold•2y ago
COPY bulks loads, so less history. Generally, it's advisable to add indexes back after for a faster load. Regarding history retention, you can disable it to further minimize storage, but I'd reserve this for an initial load into a new project. The setting will affect all databases on your branch if you have more than one. When you set it to 0, you're removing your ability to restore your data to a past state (point in time recovery).
rival-blackOP•2y ago
fyi..
just switched my production-system to my neondb successfully 🙂 i'm excited how it will perform and if neondb reaches same performance as my cockroachdb cluster had before
extended-salmon•2y ago
That's fantastic. Do keep in touch and report anything you feel is relevant! We're here to make sure you're successful 🙂
rival-blackOP•2y ago
So.... after the first evening with production traffic i can state that from a functional point of view it works quite good.
From a Performance point of view i have a pretty high discrepancy... in average my lambdas are 200ms slower than before (100ms before, now 300ms) and also the number of slow queries identified by typeorm is much higher now...
At the moment i do not really have an idea about it..
extended-salmon•2y ago
A few things:
1) Are those latencies due to cold starts? (i.e, is auto suspend enabled on your Neon database?)
2) Do you have the same sizing configuration between the prior database and your Neon database?
3) Is the Neon database in the same region as your old database?
Sorry if these are basic questions, just want to get a fuller picture.
rival-blackOP•2y ago
1: no, there was no coldstart Happening as there was pretty much traffic
2: sizing was dynamic in cockroachdb... so you cannot configure compute size or sth
3: Region is the same, yes (eu-central-1)
Regarding 2: i have autoscaling configured... but yesterday the db just scaled to 0.5 vcpu and at a later time back to 0.25
Would it help to set the minimum to a higher Level? Guess not, or?
rival-blackOP•2y ago

extended-salmon•2y ago
Setting the minimum to a higher level provides more memory, so you'll get a better cache hit rate.
rival-blackOP•2y ago
Cache Hit rate is 100% constantly... is that "good"? Or what does it mean?
extended-salmon•2y ago
Means less disk/network IO. It’s good.
If you run the same query using psql against your old db and neon, does it exhibit the performance discrepancy?
(Use a more realistic query of course 😄)
Basically, trying to determine if it’s network vs db perf differences
rival-blackOP•2y ago
What is a bit weird .... i currently run a docker Container with pg hero to get some insights...
When i look to the queries... there are not many queries having a exec time > 40ms.... and the ones having higher exec time are executed not that often...
Even if the queries with 40ms exec time were executed in 20ms on the cockroach cluster... it would not explain an increase of 300ms of the Overall execution time of my lambda...hm?
so i now tried a few queries
and for all the timing is more or less the same
also how much the execution time differs from execution to execution
so that would mean it's a network-issue ?
like-gold•2y ago
I did some testing with a neon database using the http driver and the websockets driver. The http driver performance significantly slower than the ws driver. I did the test using Next with Vercel on Cloudflare and Lambda. The average response time (only the query) using the http driver was around 100ms, meanwhile the websocket driver response was around 5ms. Hope this helps! @Simon Höfle (simon1389)
I was prewarn everything (the lambda and the neon database) before running the benchmarks
Everything on US-East-1
rival-blackOP•2y ago
Thank you for your message 😀 i am using typeorm in my project... not sure how the Support is there
And not sure ... but as i am using the connection pooling url, the additional delay for the http connection should only exist for a few queries, right?
like-gold•2y ago
did you try a raw postgres sql query? just to see if the problem is neon or typeorm?
something like
rival-blackOP•17mo ago
What do you mean? The increase of lambda execution time of about 200-300ms should not be related to typeorm, as the Code and queries is same as before?
like-gold•17mo ago
i'm just throwing ideas, i usually do it when i have no idea what's going on
just trying to help
correct-apricot•17mo ago
https://x.com/stevendsanders/status/1783543516733288685?s=46&t=XcjomLOafFkrdmqP638HKw
Found this, so maybe its some latency network issue?
Seems like its currently under investigation https://x.com/jamesbroadhead/status/1783571708269522959?s=46&t=XcjomLOafFkrdmqP638HKw
Steve Sanders (@StevenDSanders) on X
Stumbled across a great performance comparison of popular serverless SQL database providers (@PlanetScale, @neondatabase, @tursodatabase) by @capajj

Twitter
James Broadhead (@jamesbroadhead) on X
@capajj @mrjasan @StevenDSanders @PlanetScale @neondatabase @tursodatabase taking a look 👀
Twitter
rival-blackOP•17mo ago
Sorry, that shouldnt sound impolite from my side 😐
This is a General comparison, right? Not a current issue?
extended-salmon•17mo ago
Hey, so the only thing that changed the is database your connecting to?
No changes to drivers, ORMs, etc?
Does TypeORM have it's own postgres client built-in or are you using @neondatabase/serverless?
Can you explain how you're measuring the "Overall execution time of my lambda."? For example, is the lambda doing more than just querying Neon and something else is increasing the execution time?
Maybe some code like this:
extended-salmon•17mo ago
I have an application running that measures latencies. Not seeing anything odd (in relation to the tweet you shared).

rival-blackOP•17mo ago
Yes, the only change which happened Was the change from cockroachdb to postgres.
Typeorm has built in drivers for cockroach and postgres which i am using in my app
Measuring is just the General duration of my lambda (its only one lambda with proxy..)
And it significantly increased after switching to neondb as shown in the Screenshot
extended-salmon•17mo ago
And you're observing that as a higher response time for the HTTP caller that invoked the Lambda?
rival-blackOP•17mo ago
I actually dont know. But the execution time of the queries does not really differ
So not sure what else should be the reason
Short update..
I tried a postgres cluster at pgedge. There my lambda execution time is As fast as at cockroachdb before
This cluster has 2 nodes located in US and one in Frankfurt (eu-west, Google cloud)
So for my app only the node in Frankfurt is used as my lambda is also located there...
I assume, that the significant higher latency at neondb really comes from a higher http response time...
Just as a hint for you if you want to investigate as i expect other neondb users will also expierience this issue
@ShinyPokemon
correct-apricot•17mo ago
so the neon.tech frankfurt region is slower then pgedge frankfurt location?
rival-blackOP•17mo ago
I dont think its because of the region, but due to a connection latency issue
extended-salmon•17mo ago
This is strange. Can you share some code? I tested from London to Frankfurt using the Neon HTTP driver once, and the latency was very low double digits IIRC so the latency you’re seeing is unusual.
rival-blackOP•17mo ago
mh.. sharing code is a bit difficult..
but basically it's a nestjs app which i execute in a single lambda proxy function (located in eu-central-1).
nestjs provides typeorm support which executes the database queries
and just for comparison:
this is the average execution duration of my lambda function today:
before 16:30 it was running with neondb as database
then i migrated data and switched to pgedge... latency goes down to the values i had before the switch from cockroach to neondb
rival-blackOP•17mo ago

extended-salmon•17mo ago
@Simon Höfle (simon1389) what driver are you using with Typeorm??
extended-salmon•17mo ago
Alright, I did a quick test with from EU central Lambda to Neon in the same region. Code: https://gist.github.com/evanshortiss/924524b2b086182fd6e2c0a0ef0f5fee
Result with the numbers in milliseconds:
Are you positive your database and lambda are in the same region? Double check that, and double check the driver. Happy to chat if you'd like.
extended-salmon•17mo ago
I updated the code to include connection time. New result:
rival-blackOP•17mo ago
maybe then its someow related to how typeorm handles connections with neondb?
extended-salmon•17mo ago
If you’re sure about the region etc. Can you try adding the timing code to your application? How are you timing it and building the graph at the moment ?
rival-blackOP•17mo ago
its the automatic generated monitor in the aws console for my lambda
extended-salmon•17mo ago
Ah, ok. You best measure the actual query times to try and find the root cause. Let me know how I can help.
I can test Typeorm later and see if it’s slower for me