How to clear named KeyStores before every run?

I have this function
async function initialSetup() {
// Clear previous data sets
const storeKeys = ["categories_store", "details_store"]

for (const storeKey of storeKeys) {
try {
const store = await KeyValueStore.open(storeKey)
await store.drop()
} catch (error) {
console.log(error)
continue
}
}

// Create outputs directory if it doesn't exist
if (!fs.existsSync("outputs")) {
fs.mkdirSync("outputs")
}
}
async function initialSetup() {
// Clear previous data sets
const storeKeys = ["categories_store", "details_store"]

for (const storeKey of storeKeys) {
try {
const store = await KeyValueStore.open(storeKey)
await store.drop()
} catch (error) {
console.log(error)
continue
}
}

// Create outputs directory if it doesn't exist
if (!fs.existsSync("outputs")) {
fs.mkdirSync("outputs")
}
}
It is the first thing that runs before I start my crawler. It works as expected and drops both of the keystores. But when I try to write fresh data to these stores again with this code
const categoriesStore = await KeyValueStore.open("categories_store")
await categoriesStore.setValue("categories", categories)
const categoriesStore = await KeyValueStore.open("categories_store")
await categoriesStore.setValue("categories", categories)
I get this error
INFO PuppeteerCrawler: Starting the crawl
INFO PuppeteerCrawler: enqueueing new URLs
Error: Key-value store with id: 6c47506c-2c01-4a6a-9eaa-567ee6f58e96 does not exist.
at KeyValueStoreClient.throwOnNonExisting (C:\scrape_crawlee\node_modules\@crawlee\src\resource-clients\common\base-client.ts:11:15)
at KeyValueStoreClient.setRecord (C:\scrape_crawlee\node_modules\@crawlee\src\resource-clients\key-value-store.ts:222:18)
INFO PuppeteerCrawler: Starting the crawl
INFO PuppeteerCrawler: enqueueing new URLs
Error: Key-value store with id: 6c47506c-2c01-4a6a-9eaa-567ee6f58e96 does not exist.
at KeyValueStoreClient.throwOnNonExisting (C:\scrape_crawlee\node_modules\@crawlee\src\resource-clients\common\base-client.ts:11:15)
at KeyValueStoreClient.setRecord (C:\scrape_crawlee\node_modules\@crawlee\src\resource-clients\key-value-store.ts:222:18)
I was expecting it to create a new store if didn't exist, but for some reason it doesn't and I am kinda lost with this error. Any help would be appreciated!
9 Replies
exotic-emerald
exotic-emerald•3y ago
This looks like a bug, will report
exotic-emerald
exotic-emerald•3y ago
If you could report an issue here and follow the steps, it would help a lot. Thank you https://github.com/apify/crawlee/issues
GitHub
Issues · apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. - Issues · apify/crawlee
fair-rose
fair-rose•3y ago
It doesn't seem to be a bug. Herewith an example that It works as expected
fair-rose
fair-rose•3y ago
optimistic-gold
optimistic-goldOP•3y ago
I should've mentioned it in the original post, I don't have a request handler inline, instead I am importing a router from a routes.ts file which looks like this
import { createPuppeteerRouter, RequestOptions, createHttpRouter, KeyValueStore, useState } from 'crawlee';

export const categorySlugRouter = createPuppeteerRouter();

categorySlugRouter.addDefaultHandler(async ({ page, log }) => {
log.info(`enqueueing new URLs`);

const categories = await page.evaluate(() => {
const slugs: string[] = []

try {
// Web scraping code...

return slugs
} catch (error) {
return []
}
})

// Where the error occurs
const categoriesStore = await KeyValueStore.open("categories_store")
await categoriesStore.setValue("categories", categories)
});
import { createPuppeteerRouter, RequestOptions, createHttpRouter, KeyValueStore, useState } from 'crawlee';

export const categorySlugRouter = createPuppeteerRouter();

categorySlugRouter.addDefaultHandler(async ({ page, log }) => {
log.info(`enqueueing new URLs`);

const categories = await page.evaluate(() => {
const slugs: string[] = []

try {
// Web scraping code...

return slugs
} catch (error) {
return []
}
})

// Where the error occurs
const categoriesStore = await KeyValueStore.open("categories_store")
await categoriesStore.setValue("categories", categories)
});
Then in the main.ts I am setting it up like this
import { categorySlugRouter } from './routes.js';

async function scrapeCategorySlugs() {
const startUrls = ['http://www.example.com'];

const categorySlugCrawler = new PuppeteerCrawler({
requestHandler: categorySlugRouter,
});

await categorySlugCrawler.run(startUrls);
}
import { categorySlugRouter } from './routes.js';

async function scrapeCategorySlugs() {
const startUrls = ['http://www.example.com'];

const categorySlugCrawler = new PuppeteerCrawler({
requestHandler: categorySlugRouter,
});

await categorySlugCrawler.run(startUrls);
}
fair-rose
fair-rose•3y ago
Another example that It works as expected with PuppeteerRouter.
optimistic-gold
optimistic-goldOP•3y ago
I'll test this with a new project and see if I can still reproduce it
MEE6
MEE6•3y ago
@Vrockz just advanced to level 1! Thanks for your contributions! 🎉
exotic-emerald
exotic-emerald•3y ago
@LeMoussel Thanks for helping!

Did you find this page helpful?