saving data in apify actor and cleaning

ive tried saving the data to a rawdata.json file from the data i scrape from my actors, however i dont get a json output even thought the scraping works how would i save the data to the apify console that i can then use mongodb to take that data and put it in my database - i have my mongodb schema already setup so how would i save the data to the apify console and access it would i have to save it to the apify dataset, if so how, and how would i also put it through a cleaning process through the same actor or if possible, a different actor and THEN save it to a mongodb database? heres what i have for saving the json file so far:
6 Replies
conscious-sapphire
conscious-sapphireOP2y ago
bambawRouter.addHandler('BAMBAW_PRODUCT', async ({ page, request }) => {
try {
console.log('Scraping products');

const site = 'Bambaw';

const title = await page.$eval('h1.product__title', (el) => el.textContent?.trim() || '');

const descriptions = await ......

const productData = {
url: request.loadedUrl,
site,
title,
descriptions,
originalPrice,
salePrice,
shippingInfo,
reviewScore,
reviewNumber,
};

productList.push(productData);

console.log('Scraped ', productList.length, ' products')
// Read the existing data from the rawData.json file
let rawData: any = {};
try {
const rawDataStr = fs.readFileSync('rawData.json', 'utf8');
rawData = JSON.parse(rawDataStr);
} catch (error) {
console.log('Error reading rawData.json:', error);
}

// Append the new data to the existing data
if (rawData.productList) {
rawData.productList.push(productData);
} else {
rawData.productList = [productData];
}

// Write the updated data back to the rawData.json file
fs.writeFileSync('rawData.json', JSON.stringify(rawData, null, 2));
console.log('rawData.json updated for Bambaw');
} catch (error) {
console.log('Error scraping product:', error);
bambawQueue.reclaimRequest(request);
return;
}
bambawRouter.addHandler('BAMBAW_PRODUCT', async ({ page, request }) => {
try {
console.log('Scraping products');

const site = 'Bambaw';

const title = await page.$eval('h1.product__title', (el) => el.textContent?.trim() || '');

const descriptions = await ......

const productData = {
url: request.loadedUrl,
site,
title,
descriptions,
originalPrice,
salePrice,
shippingInfo,
reviewScore,
reviewNumber,
};

productList.push(productData);

console.log('Scraped ', productList.length, ' products')
// Read the existing data from the rawData.json file
let rawData: any = {};
try {
const rawDataStr = fs.readFileSync('rawData.json', 'utf8');
rawData = JSON.parse(rawDataStr);
} catch (error) {
console.log('Error reading rawData.json:', error);
}

// Append the new data to the existing data
if (rawData.productList) {
rawData.productList.push(productData);
} else {
rawData.productList = [productData];
}

// Write the updated data back to the rawData.json file
fs.writeFileSync('rawData.json', JSON.stringify(rawData, null, 2));
console.log('rawData.json updated for Bambaw');
} catch (error) {
console.log('Error scraping product:', error);
bambawQueue.reclaimRequest(request);
return;
}
Pepa J
Pepa J2y ago
Hmm... this should generally work... The question might be where is the file saved. You might find the examples for working with Dataset here https://crawlee.dev/api/core/class/Dataset (this will generated a new fiel in storages folder for each item in dataset). You should be able to even send it to the MongoDB directly, depends on your use-case.
conscious-sapphire
conscious-sapphireOP2y ago
would i have to install the fs dependency if so how
Pepa J
Pepa J2y ago
no fs module is part of nodejs instalation
conscious-sapphire
conscious-sapphireOP2y ago
does this work in an Actor because it only seems to work on my local compouter
Pepa J
Pepa J2y ago
@harish So I am not sure where do you run it. This #crawlee-js so you should be fully in control of wherever and how you run it. Are running it on Apify Platform? Then you may send me in DM a link with the run so I may check it.

Did you find this page helpful?