RocksDB swaps?
I am new to TypeDB. I was wondering if there has been any discussions in the past around providing mechanisms to plug our own storage engine? Also the entire query processing pipeline seems to written in non-async Rust. Was this choice taken with RocksDB storage engine in mind?
6 Replies
Hi @rajivranganath ! Great questions!
We only support RocksDB ( and previously rocksdb drop ins like SpeeDB before they got acquired!) right now, though it's not inconceivable to allow any other ordered key-value store, though we might have to rework the checkpointing mechanism which leverages RocksDB's efficient checkpointing! Did you have something specific in mind?
And yes it's non-async right now as RocksDB bindings don't leverage IO_Uring or other async IO operations, so we built the model to be threaded on top
I do like the idea of shifting it to be async top to bottom at some point. The other tradeoff is that debugging & performance optimizing threaded code is in my opinion much easier than async rust
Yes, I would like to explore using TypeDB along with FoundationDB (which is a distributed KV store with ACID support).
I had previously written (now not actively maintained) Rust bindings for FDB.
I would be interested in exploring if TypeDB "layer" can be developed on top of FDB.
I agree. Btw, you might want to take a look at this (in case you are not already aware).
https://datafusion.apache.org/blog/2025/06/30/cancellation/
That's very interesting! we have a moderately similar approach that is threaded where we pass batches around - though it's not async & single threaded (more of a stack machine). When we get back to the execution engine I definitely want to make each step in the computation be concurrent!
that would be interesting! Basically we need 3 operators:
- get(key)
- insert(key, value)
- range_scan(prefix bytes)
We currently implement deletes as inserts with a tombstone, but at some point we will need to implement a background cleanup that has RocksDB delete them. However, this is 'efficient' because of Compaction processing these delete events properly when compaction happens... A) not sure how that would look in FDB
In addition, we need a (moderately efficient) way to checkpoint a TypeDB database (note: we have 5 RocksDB databases powering 1 TypeDB database under the hood each with different optimizations). In essence:
1) commits are written and synced to the WAL
2) data batches are written non-durably to RocksDB databases
In the background: a thread "checkpoints" TypeDB by using RocksDB "checkpoint" mechanism, which is super efficient: it just hardlinks the existing RocksDB data files into a directory - we can do this about every second. Then on bootup, we restore the last checkpoint & replay the WAL from that point in time.
Therefore, for using FDB or any other storage layer we'd have to B) think how to do bootup/checkpointing efficiently. OFC the naiive just replay the entire WAL always works. The naiive solution of using durable writes to the KV storage layer may or may not work if the writes are also split up across multiple KV storage instances like we do with RocksDB
Here are the implementations for my bindings (other bindings would also have similar APIs).
* get(key) - https://docs.rs/fdb/0.3.1/fdb/transaction/trait.ReadTransaction.html#tymethod.get
* insert(key, value) - https://docs.rs/fdb/0.3.1/fdb/transaction/trait.Transaction.html#tymethod.set
* range_scan(prefix_bytes) - https://docs.rs/fdb/0.3.1/fdb/transaction/trait.ReadTransaction.html#tymethod.get_range
Regarding deletes, the two APIs are
clear (https://docs.rs/fdb/0.3.1/fdb/transaction/trait.Transaction.html#tymethod.clear) and clear_range (https://docs.rs/fdb/0.3.1/fdb/transaction/trait.Transaction.html#tymethod.clear_range).
It is up to the storage engines in FDB as to how they implement deletes. Currently there are two "main" storage engines - Redwood (B+ tree, used by Snowflake in their internal fork of FDB, but the storage engine is available for the community) and RocksDB (LSM tree, used by Apple).
Regarding WAL, and multiple storage instances, FDB uses a unbundled architecture with a separate transaction system, log system (WAL) and storage servers. I would recommend checking out https://www.foundationdb.org/files/fdb-paper.pdf to see how everything comes together.
Currently I'm working on other aspects of FDB (around deployment, upgrades, monitoring/logging/alerting, performance analysis, security etc.,). I hope to get back to layer development in a few months and revive the crate.
If there is interest in exploring this further (and adding async support to query engine), I can spend my free time learning about TypeDB, and see how to create a TypeDB layer on top of FDB.
This might be another paper that might be interesting to you - https://www.foundationdb.org/files/record-layer-paper.pdfReadTransaction in fdb::transaction - Rust
A read-only subset of a FDB
Transaction.Transaction in fdb::transaction - Rust
A
Transaction represents a FDB database transaction.Thanks, that's very interesting! I think there's definitely interest in the broader community for a swappable storage layer - and not just foundation. In particular something that can be compiled to WASM for browser/local/edge deployments has come up as well.
I'd be interested in taking a bit of time to see if I can build a cleaner layer split and then that would allow you to investigate integrating alternatives?
In terms of mainlining & promoting that kind of a change in our central repository - we had a discussion and think that our standard for mainlining alternative storage engines or changes in that direction will be quite high. However, we're happy to point to community forks and projects working on those kinds of things, provide guidance etc, until that standard is reached.
Hope that makes sense!
it's also interesting that we've kinda designed the core storage layer to be in line with the Calvin distributed storage model, which is more in line with FoundationDB already
Thanks for the reply!
Yes please take your time. I still need to learn TypeDB and finish my existing backlog on the tooling needed to operationalize FDB.
Could you also please see if there is a possibility of re-using DataFusion libraries along with TypeDB.
DataFusion supports customizing frontends (TypeDB can be one such frontend), catalog providers, ability to add custom nodes to logical and physical plans, defining own optimization rules, etc.,
Depending on the context (local, edge, cloud) - a different execution operators, optimization rules and catalog providers can be used.
From a storage engine perceptive, we will need to focus on just implementing the physical operators for a particular storage engine and providing the required catalog information.
I had explored DataFusion previously, but was unable to proceed due to blocking operator issue (https://datafusion.apache.org/blog/2025/06/30/cancellation/) that I shared earlier. Blocking operators is not compatible with FDB as it imposes very strict limits on transaction size and duration. Now that it seems to be have been fixed, it looks like using DataFusion with FDB will become feasible.