Json2csv overwrites columns
Hello, I have issue with json2csv, each line overwrites the previews one when I create excel file, I can see that in VS Code -> dataset, while creating the xlsx file, instead of creating multipole columns, it writes a new line instead of the previews line end I end up with a file with 1 line. Please help. Thank you
This is my code:
import { PlaywrightCrawler, Dataset } from 'crawlee';
import { writeFileSync } from 'fs';
import { parse } from 'json2csv';
const crawler = new PlaywrightCrawler({
requestHandler: async ({ page, request, enqueueLinks }) => {
console.log(Processing: ${request.url})
if (request.label === 'DETAIL') {
const branch_name = await page.locator('.store-details-right > h1 > span:nth-child(2)').textContent();
const street_name = await page.locator('.info-wrapper > .info:nth-child(1) > .info-value').textContent();
const results = {
branch_name: branch_name,
street_name: street_name,
}
const dataset = await Dataset.open('results-dataset');
await dataset.pushData(results);
const csv = parse(results);
writeFileSync('results.csv', csv);
} else {
// await page.waitForSelector('.more-details');
await enqueueLinks({
selector: '.more-details',
label: 'DETAIL',
})
}
},
headless: true,
});
await crawler.run(['https://www.example.com/]);
10 Replies
conscious-sapphire•3y ago
since you're running this request handler for every request, you create the same csv file for every request which is why it is being overwritten for solving this add another else block where you await for the dataset and finally store the results in the dataset as csv
absent-sapphireOP•3y ago
Many thanks for the answer bro.
Any chance to get a snippet how I do that? Otherwise it might take me a few hours to figure it out. (I have little knowledge in Java script)
Please? 🙂
sensitive-blue•3y ago
every time the requestHandler function runs, you create
const csv = parse(results);
and then write that to a file. However, the writeFileSync functions deletes the whole previous contents of the file before writing the new content, so only the last processed request ends up in the file.
The easiest solution is probably to create a global array for the results before the crawler even starts, by adding
And then in the request handler, just allResults.push(results)
instead of parsing it to CSV and writing the file.
And then only write the result into the file after the whole crawl finishes:
absent-sapphireOP•3y ago
Thank you very much, I tried it now and I got this error, do you know why?
throw new Error('Data should not be empty or the "fields" option should be included');
^
Error: Data should not be empty or the "fields" option should be included
sensitive-blue•3y ago
Can you please copypaste or screenshot the whole error? It should include some backtrace of where it happened
absent-sapphireOP•3y ago
C:\Scrapers\Roladin\my-crawler\node_modules\json2csv\lib\JSON2CSVParser.js:57
throw new Error('Data should not be empty or the "fields" option should be included');
^
Error: Data should not be empty or the "fields" option should be included
at JSON2CSVParser.preprocessData (C:\Scrapers\Roladin\my-crawler\node_modules\json2csv\lib\JSON2CSVParser.js:57:13)
at JSON2CSVParser.parse (C:\Scrapers\Roladin\my-crawler\node_modules\json2csv\lib\JSON2CSVParser.js:20:32)
at module.exports.parse (C:\Scrapers\Roladin\my-crawler\node_modules\json2csv\lib\json2csv.js:15:65)
at file:///C:/Scrapers/Roladin/my-crawler/src/main.js:227:13
@pelsec just advanced to level 5! Thanks for your contributions! 🎉
sensitive-blue•3y ago
Data should not be empty (...)in the csv serialization - the allResults array is most likely empty due to some mistake. can you share the updated code?
absent-sapphireOP•3y ago
absent-sapphireOP•3y ago
I tried to change from this
const csv = parse(allResults);
To this
const csv = parse([allResults]);
This time it created an empty file
UP @mvolfik
Ok solved thanks!