Storage of data or returning of results

Hello, this shouldn't take long.
Am I reading correctly (and have tested) that returning results to a promise or a callback isn't an option with this SDK (crawlee w/ new PlaywrightCrawler() for example) ? We can only write to Datasets and retrieve later for use?
8 Replies
absent-sapphire
absent-sapphire3y ago
Yes, either push data to dataset or set value in KV store (directly or by state management), everything else is out of SDK. Temp results might be added to userData in request object.
optimistic-gold
optimistic-goldOP3y ago
Thanks for your response! I’ll explore temp results to request object as well.
wise-white
wise-white3y ago
If you're trying to store a global state that's accessible to the entire run, I recommend using useState https://crawlee.dev/api/basic-crawler/class/BasicCrawler#useState
BasicCrawler | API | Crawlee
Provides a simple framework for parallel crawling of web pages. The URLs to crawl are fed either from a static list of URLs or from a dynamic queue of URLs enabling recursive crawling of websites. BasicCrawler is a low-level tool that requires the user to implement the page download and data extraction functionality themselves. If we want a c...
optimistic-gold
optimistic-goldOP3y ago
yes, thanks for this!
mute-gold
mute-gold3y ago
Crawlee doesn't force you to store data the default way. You can override the push function or write your own callback.
optimistic-gold
optimistic-goldOP3y ago
thanks @thek1tten your response got me exactly where I wanted with a little help from here: https://github.com/apify/crawlee/blob/e0f6e628596dcca3da14e31a68d04b7b11316fb2/docs/upgrading/upgrading_v3.md#auto-saved-crawler-state
GitHub
crawlee/upgrading_v3.md at e0f6e628596dcca3da14e31a68d04b7b11316fb2...
Crawlee—A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. - crawlee/upgrading_v3.md at e0f6e628596dcca3da14e31a68d04b7b11316fb2 · apify/crawlee
passive-yellow
passive-yellow3y ago
I don't understand the datasets. Why do you store the results as seperate files? How do you loop through the whole results later? I want to store an array of object and loop through it later in the code
mute-gold
mute-gold3y ago
You can load the dataset items and then loop through them and store as you wish.

Did you find this page helpful?