Neon•2y ago

afraid-scarlet

Disk space costs

There is a lot to love about Neon - my team is developing multiple projects on it currently and are very happy. But I just wanted to share some experiences I had this week. We tried to pitch 2 companies we are working with on Neon over RDS/Aurora for new greenfield projects that we are preparing to move to production. Both were open to Neon until they saw the pricing page and realized storage would run them $1.50 list per GB. I have learned not to question a company's specific business model but the reaction to the disk space was eye opening. I don't see either company exceeding more than a TB of data, but with 7 day retention, that quickly becomes 7TB and around $10k of monthly costs. Put in perspective, 7TB of storage with a data warehouse saas both companies use runs about < $200 per month. As I mentioned, I do not claim to know the input costs into Neon so I'll assume there is a reason to price disk space where it is, but in the two cases I mentioned, both clients didnt even want to bother going through the sales process to see what discounts at volume might be available and simply decided to stick with what they were familiar with (AWS). If there are significant discounts available for disk space at volume, it might be worth being transparent about it on the pricing page so potential clients can clearly see what the costs might be as they reach larger production volumes. My experience is that disk space is seen purely as a commodity these days and pricing at a significant premium over competitors distracts from the overall value Neon brings to the table.

28 Replies

wise-white•2y ago

@algads when talking about TB scales, it is best to talk to the sales team.

afraid-scarletOP•2y ago

I get that but both clients just proverbially shrugged their shoulders and said they were sticking with RDS. Basically, their teams took the position they know AWS, they are familiar with it and so why would they bother going through the sales process with a new unknown company whose public pricing on disk space was so out of whack with what they are familiar with. As feedback, I'd suggest being more transparent with disk space pricing and include wider tiers on the pricing page without having to go through sales: ie: 50-100GB $1.50/GB. 100-250GB: 1.25/GB, etc. As I said, just a suggestion given my recent experience.

ambitious-aqua•2y ago

@algads this is super useful and timely feedback - thank you for sharing it. and thank you for sharing Neon with clients (even if it doesnt land)

afraid-scarletOP•2y ago

@andyhats I just read the announcement that seems to be about a month old re: the bring your own S3. That seems like a real game changer long term. Kudos. This is the type of feature that some of the clients I mentioned would actually see as a big positive during evaluation. A handful of questions: 1) is the data in S3 searchable with other tools that are not neon (this would actually be huge)? I assume not since the files are actually page files and not in something like parquet or avro format, but I figured I'd ask. 2) assuming it can't be directly read, can the data be easily read and transformed into another format (ie: move from the page file format to parquet in another bucket for reading by duckdb or other such tool) and 3) does this impact the entire conversation surrounding disk space pricing?

wise-white•2y ago

@algads re 1) what use cases did you have in mind? Running OLAP queries over the same on-disk data?

afraid-scarletOP•2y ago

@Tristan Partin yea exactly, right now most of the pipelines we build would normally use CDC via debezium to push to NATs and then a worker would move to S3 in avro or parquet format for auditing and analysis purposes. In some cases, it would also get moved out to warehouses like Snowflake although we are making use of DuckDB more and more these days. Being able to read directly from S3 or use an S3 trigger to start a lambda that reads it out to a more suitable format would simplify the described architecture.

wise-white•2y ago

Specifically talking about duckdb, have you seen https://duckdb.org/docs/extensions/postgres.html

DuckDB

PostgreSQL Extension

The postgres extension allows DuckDB to directly read and write data from a running Postgres database instance. The data can be queried directly from the underlying Postgres database. Data can be loaded from Postgres tables into DuckDB tables, or vice versa.See the official announcement for implementation details and background. Installing and L...

wise-white•2y ago

You could use a read only replica and allow duckdb to read from it I feel like I remember @Conrad Ludgate talking about parquet files too

foreign-sapphire•2y ago

Neon does use parquet, but only for our own analytics. I don't think any of our storage software uses parquet

wise-white•2y ago

I feel like you could probably rig something up in kafka, where it reads from postgres and outputs parquet files into another s3 bucket. Though that would require our logical replication to work again haha

afraid-scarletOP•2y ago

Have used DuckDB extension but its still using Postgres under the covers unless we then copy to DuckDB. DuckDB over Postgres is faster than "plain old" PG but not orders of magnitude faster. The copy from PG to DuckDB is great for ad hoc and can be setup as a cron but its not the greatest setup when it comes to visiblity, reliability, etc. Our current approach is to use Postgres logical rep - Debezium - Nats and/or Redpanda and avoid Kafka but its the same architecture you raised with Kafka. Ultimately, for now logical replication to whatever pipeline seems like the best approach. Would be awesome if somewhere down the line the storage servers could write multiple representations of the data to S3 storage - one for Neon postgres interactions and another to create a data lake for ad hoc analytics. That would be a killer feature! Customers could get everything PG brings to the table for OLTP but also get a data lake representation for OLAP without all the ETL headaches.

deep-jade•2y ago

The OP wrote "I don't see either company exceeding more than a TB of data, but with 7 day retention, that quickly becomes 7TB and around $10k of monthly costs. " Is this a correct approximation about retention costs? This calculation is essentially assuming that each day has a full copied snapshot. I was thinking retention size is only the size of the changed data since 7 days ago.

foreign-sapphire•2y ago

You are correct. Retention only accounts for the change in data. If you replace 1TB every day, then you would reach 7TB. But if you have 1TB and you only add 1GB a day, then it would be ~1.032TB It is likely a bit more complicated than that, but that's my understanding of how the storage calculations work

deep-jade•2y ago

Phew, thanks @Conrad Ludgate for the speedy reply!

foreign-sapphire•2y ago

https://neon.tech/docs/introduction/usage-metrics

Neon

Usage metrics - Neon Docs

As described in How billing works, each of Neon's plans includes Storage, Compute, and Project usage allowances. The Launch and Scale plans permit extra usage. This topic describes Storage, Compute, a...

afraid-scarletOP•2y ago

Ah that is much different from my understanding, thanks. Still very expensive for disk cost but more reasonable with the retention. One thing I had wondered about, if we bring our own S3 are the disk costs reduced substantially since our account would incur much of the storage cost and API usage fees? Obviously, there is still storage going on at the pageserver and storageserver level, but having high disk costs in addition to the S3 costs we'd incur kinda kills the interest in that feature. Totally understand if this is something the team is still figuring out since its a recently announced feature, but thought I'd ask.

foreign-sapphire•2y ago

S3 doesn't actually account for much of our storage costs. Our pageservers would be the most significant cost when it comes to storage

afraid-scarletOP•2y ago

Interesting, good to know.

deep-jade•2y ago

@algads - Curious, when your clients saw the $1.5/GB price, were they assuming that they needed to multiply the GB use times n days of retention? If so, are they now more open to Neon with this clarification? From my interpretation of Neon's pricing, it seems like Neon simplified the pricing recently and in that GB price, Neon is baking in a lot of otherwise expensive features like branching and point-in-time recovery

unwilling-turquoise•2y ago

Thanks for the feedback. For many workloads, our pricing is competitive, especially if the serverless element is useful (for example, with RDS you might have a Read Only Replica running 24/7 vs Neon you might instantly create it, use it for 20 mins then destroy it). At the same time, our pricing isn't cheaper for every workload. We're aware of this and are actively looking for ways to reduce our running costs without implacting reliability or performance—which we'd reflect in our pricing.

deep-jade•2y ago

Thanks @Mike J . In Neon, what's the use case for Neon read replicas? When I search for "Read replica" in Neon docs, there are no search results. My understanding of Neon scaling is that a customer picks a plan, like Launch or Scale. They all have the same default compute, 0.25 vCPU, 1GM RAM. But the different plans autoscale to different ceilings, e.g. Scale plan can, but may not necessarily, autoscale to 8 vCPU and 32 GB RAM. The customer never explicitly declares a read replica. Under the hood, Neon is adding more Postgres servers to accomplish to autoscale. So, in other words, it seems like a Neon customer never explicitly configures horizontal scaling. The closest thing to a customer explicitly configuring horizontal scaling is paying for PolyScale for geographically distributed cache.

foreign-sapphire•2y ago

Neon autoscaling is not horizontal. Stock postgres cannot support horizontal scaling without you specifically making your data model fit. Instead, Neon scales vertically by making various use of VM hot resizing capabilities. Read replicas are not created automatically and you still have to opt into using them, by using the read only connection string. Those do allow for horizontal scaling since they don't allow writes that could potentially conflict. A small caveat here is that the underlying storage is horizontally scaled on our pageservers

deep-jade•2y ago

Thanks @Conrad Ludgate , good info here. I asked about the read replica because I was concerned about the price per GB. If read replicas were a normal part of the Neon setup, that would take some planning because much of the price of Neon seems determined by the compute hours and disk usage. I think in Planetscale, a read replica basically doubles your disk usage. So, my takeaway is that read replica is not part of a normal Neon setup.

foreign-sapphire•2y ago

Aha, well, we do have clever functionality to make read replicas not need any additional storage So they are well integrated into Neon and only cost you compute time

deep-jade•2y ago

Wow, that seems like a very differentiating feature. Why not showcase that more? In Planetscale, I think the way to make low-latency calls around the world is to make read replicas in various regions

foreign-sapphire•2y ago

We have some articles on the topic, essentially the way postgres works is through pages and the write-ahead-log (WAL). Each WAL entry is met with a new 'Log Sequence Number' (LSN). Part of Neon's special sauce is that we can fetch pages from a given LSN, including in the past (this is how our branching and our instant point in time recovery features also work). A traditional read replica has a full copy of the database, and listens to the WAL published by the primary, and applies all those to the underlying storage. Because Neon already stores the storage separately, there's no need to duplicate it. Instead the replica listens to the WAL for the new LSN to request from, and maybe updates the in memory caches if necessary. It's very clever stuff Ah, yes. So our storage is region local, so if you want a multi-region replica you would be duplicating storage (otherwise the latency would not be improved) We don't yet offer a convenient multi region replica setup. We are working on our logical replication feature at the moment which allows for this, but it's not quite ready yet. But this would increase the storage cost per region

deep-jade•2y ago

Ah, it was too good to be true! 🙂 Thanks Conrad, I appreciate the detail. I'm currently deciding between Neon and Supabase myself and I really like Neon's transparency with their architecture. So like the OP in this thread, I'm trying to understand the components inside Neon's $1.5/gb price (compared to Supabase) and forming this venn diagram of features baked into that storage price. Regardless, I find Neon's blog as pretty distinctive and gives confidence that hard-to-test things like PiTR actually work. Keep it up!

foreign-sapphire•2y ago

Yeah, having done disaster recovery tests against databases in the past is a real pain. You have to run through the process regularly to be sure. With neon it really is a simpler thing to test. It takes a lot off of the mind, not having to even think about backups

Gaming

Programming

Disk space costs

Did you find this page helpful?