Only-once storage
Helllo all,
I’m looking to understand how crawlee uses storage a little better and have a question regarding that:
Crawlee truncates the storage of all indexed pages every time I run. Is there a way to not have it do that? Almost like using it as an append-only log for new items found.
Worst case scenario, I can keep an in-memory record of all pages and simply not write to disk when I see it. Curious what best practices are here.
2 Replies
Someone will reply to you shortly. In the meantime, this might help:
-# This post was marked as solved by royrusso. View answer.
flat-fuchsiaOP•2mo ago
Ignore. I was calling datastore.dump() and then wondered why purgeonstart wasn’t working. :perfecto: