Dataset.open(..) doesn't init dataset - when called outside of handler

Hi Due to performance issues - I want to move out from handler all possible awaits. For example here: router.addHandler('details', async ({request, page, enqueueLinks, log}) => { const data = await page.evaluate(() => { // collect data .. return collectedData; }); const dataset = await Dataset.open('myData'); await dataset.pushData(data); }) I want to move out from handler - init of dataset - like: const dataset = await Dataset.open('myData'); router.addHandler('details', async ({request, page, enqueueLinks, log}) => { const data = await page.evaluate(() => { // collect data .. return collectedData; }); await dataset.pushData(data); }) but now dataset is not initialised on crawlee start. Folder ./storage/datasets/myData is not created. And I get log WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. Dataset with id: e4901ade-57c3-49ec-8300-5a96338d381b does not exist. How can I properly init dataset in this case? Thank you Cheers GT
2 Replies
flat-fuchsia
flat-fuchsia3y ago
You can set a new property to the crawler with Object.defineProperty() [1] For example:
router.addHandler('details', async ({request, page, enqueueLinks, log}) => {
const data = await page.evaluate(() => {
// collect data ..
return collectedData;
});
if (Object.prototype.hasOwnProperty.call(this, 'mydataset')) {
await this.mydataset.pushData(data);
}
})

const dsMyData = await Dataset.open('myData');
Object.defineProperty(crawler, 'mydataset', {
value: dsMyData
})
const crawlStats = await crawler.run([{ url: 'https://crawlee.dev' }]);
log.info(crawlStats)
router.addHandler('details', async ({request, page, enqueueLinks, log}) => {
const data = await page.evaluate(() => {
// collect data ..
return collectedData;
});
if (Object.prototype.hasOwnProperty.call(this, 'mydataset')) {
await this.mydataset.pushData(data);
}
})

const dsMyData = await Dataset.open('myData');
Object.defineProperty(crawler, 'mydataset', {
value: dsMyData
})
const crawlStats = await crawler.run([{ url: 'https://crawlee.dev' }]);
log.info(crawlStats)
[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/defineProperty
Object.defineProperty() - JavaScript | MDN
The Object.defineProperty() static method defines a new property directly on an object, or modifies an existing property on an object, and returns the object.
optimistic-gold
optimistic-goldOP3y ago
Thanks of your example I realized that dataset must be called after creating crawler. const crawler = new PuppeteerCrawler(..); const myData = await Dataset.open(..); await crawler.run(..); Thank you ;] Earlier it was called 'before'.

Did you find this page helpful?