Dataset.open(..) doesn't init dataset - when called outside of handler
Hi
Due to performance issues - I want to move out from handler all possible
awaits
.
For example here:
router.addHandler('details', async ({request, page, enqueueLinks, log}) => {
const data = await page.evaluate(() => {
// collect data ..
return collectedData;
});
const dataset = await Dataset.open('myData');
await dataset.pushData(data);
})
I want to move out from handler - init of dataset - like:
const dataset = await Dataset.open('myData');
router.addHandler('details', async ({request, page, enqueueLinks, log}) => {
const data = await page.evaluate(() => {
// collect data ..
return collectedData;
});
await dataset.pushData(data);
})
but now dataset is not initialised on crawlee start.
Folder ./storage/datasets/myData
is not created.
And I get log
WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. Dataset with id: e4901ade-57c3-49ec-8300-5a96338d381b does not exist.
How can I properly init dataset in this case?
Thank you
Cheers
GT2 Replies
flat-fuchsia•3y ago
You can set a new property to the crawler with
Object.defineProperty()
[1]
For example:
[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/definePropertyObject.defineProperty() - JavaScript | MDN
The Object.defineProperty() static method defines a new property directly on an object, or modifies an existing property on an object, and returns the object.
optimistic-goldOP•3y ago
Thanks of your example I realized that dataset must be called after creating crawler.
const crawler = new PuppeteerCrawler(..);
const myData = await Dataset.open(..);
await crawler.run(..);
Thank you ;]
Earlier it was called 'before'.