CA
deep-jade

saving data in apify actor

ive tried saving the data to a rawdata.json file from the data i scrape from my actors, however i dont get a json output even thought the scraping works how would i save the data to the apify console that i can then use mongodb to take that data and put it in my database - i have my mongodb schema already setup so how would i save the data to the apify console and access it would i have to save it to the apify dataset, if so how, and how would i also put it through a cleaning process through the same actor or if possible, a different actor and THEN save it to a mongodb database?' would i have to download fs somehow in the apify console to make this work? heres what i have for saving the json file so far:
3 Replies
deep-jade
deep-jadeOP2y ago
bambawRouter.addHandler('BAMBAW_PRODUCT', async ({ page, request }) => {
try {
console.log('Scraping products');

const site = 'Bambaw';

const title = await page.$eval('h1.product__title', (el) => el.textContent?.trim() || '');

const descriptions = await ......

const productData = {
url: request.loadedUrl,
site,
title,
descriptions,
originalPrice,
salePrice,
shippingInfo,
reviewScore,
reviewNumber,
};

productList.push(productData);

console.log('Scraped ', productList.length, ' products')
// Read the existing data from the rawData.json file
let rawData: any = {};
try {
const rawDataStr = fs.readFileSync('rawData.json', 'utf8');
rawData = JSON.parse(rawDataStr);
} catch (error) {
console.log('Error reading rawData.json:', error);
}

// Append the new data to the existing data
if (rawData.productList) {
rawData.productList.push(productData);
} else {
rawData.productList = [productData];
}

// Write the updated data back to the rawData.json file
fs.writeFileSync('rawData.json', JSON.stringify(rawData, null, 2));
console.log('rawData.json updated for Bambaw');
} catch (error) {
console.log('Error scraping product:', error);
bambawQueue.reclaimRequest(request);
return;
}
bambawRouter.addHandler('BAMBAW_PRODUCT', async ({ page, request }) => {
try {
console.log('Scraping products');

const site = 'Bambaw';

const title = await page.$eval('h1.product__title', (el) => el.textContent?.trim() || '');

const descriptions = await ......

const productData = {
url: request.loadedUrl,
site,
title,
descriptions,
originalPrice,
salePrice,
shippingInfo,
reviewScore,
reviewNumber,
};

productList.push(productData);

console.log('Scraped ', productList.length, ' products')
// Read the existing data from the rawData.json file
let rawData: any = {};
try {
const rawDataStr = fs.readFileSync('rawData.json', 'utf8');
rawData = JSON.parse(rawDataStr);
} catch (error) {
console.log('Error reading rawData.json:', error);
}

// Append the new data to the existing data
if (rawData.productList) {
rawData.productList.push(productData);
} else {
rawData.productList = [productData];
}

// Write the updated data back to the rawData.json file
fs.writeFileSync('rawData.json', JSON.stringify(rawData, null, 2));
console.log('rawData.json updated for Bambaw');
} catch (error) {
console.log('Error scraping product:', error);
bambawQueue.reclaimRequest(request);
return;
}
Pepa J
Pepa J2y ago
I think
await Actor.pushData(productData);
await Actor.pushData(productData);
is probably what you want, this will put one item to the dataset.
deep-jade
deep-jadeOP2y ago
how do i define Actor in the console w/out getting an error during the build

Did you find this page helpful?