Storage of data or returning of results
Hello, this shouldn't take long.
Am I reading correctly (and have tested) that returning results to a promise or a callback isn't an option with this SDK (crawlee w/ new PlaywrightCrawler() for example) ? We can only write to Datasets and retrieve later for use?
Am I reading correctly (and have tested) that returning results to a promise or a callback isn't an option with this SDK (crawlee w/ new PlaywrightCrawler() for example) ? We can only write to Datasets and retrieve later for use?
8 Replies
absent-sapphire•3y ago
Yes, either push data to dataset or set value in KV store (directly or by state management), everything else is out of SDK. Temp results might be added to userData in request object.
optimistic-goldOP•3y ago
Thanks for your response! I’ll explore temp results to request object as well.
wise-white•3y ago
If you're trying to store a global state that's accessible to the entire run, I recommend using
useState
https://crawlee.dev/api/basic-crawler/class/BasicCrawler#useStateBasicCrawler | API | Crawlee
Provides a simple framework for parallel crawling of web pages.
The URLs to crawl are fed either from a static list of URLs
or from a dynamic queue of URLs enabling recursive crawling of websites.
BasicCrawler
is a low-level tool that requires the user to implement the page
download and data extraction functionality themselves.
If we want a c...optimistic-goldOP•3y ago
yes, thanks for this!
mute-gold•3y ago
Crawlee doesn't force you to store data the default way. You can override the push function or write your own callback.
optimistic-goldOP•3y ago
thanks @thek1tten your response got me exactly where I wanted with a little help from here: https://github.com/apify/crawlee/blob/e0f6e628596dcca3da14e31a68d04b7b11316fb2/docs/upgrading/upgrading_v3.md#auto-saved-crawler-state
GitHub
crawlee/upgrading_v3.md at e0f6e628596dcca3da14e31a68d04b7b11316fb2...
Crawlee—A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. - crawlee/upgrading_v3.md at e0f6e628596dcca3da14e31a68d04b7b11316fb2 · apify/crawlee
passive-yellow•3y ago
I don't understand the datasets. Why do you store the results as seperate files?
How do you loop through the whole results later?
I want to store an array of object and loop through it later in the code
mute-gold•3y ago
You can load the dataset items and then loop through them and store as you wish.