Crawlee & Apify•3y ago

Json2csv overwrites columns

Hello, I have issue with json2csv, each line overwrites the previews one when I create excel file, I can see that in VS Code -> dataset, while creating the xlsx file, instead of creating multipole columns, it writes a new line instead of the previews line end I end up with a file with 1 line. Please help. Thank you This is my code: import { PlaywrightCrawler, Dataset } from 'crawlee'; import { writeFileSync } from 'fs'; import { parse } from 'json2csv'; const crawler = new PlaywrightCrawler({ requestHandler: async ({ page, request, enqueueLinks }) => { console.log(Processing: ${request.url}) if (request.label === 'DETAIL') { const branch_name = await page.locator('.store-details-right > h1 > span:nth-child(2)').textContent(); const street_name = await page.locator('.info-wrapper > .info:nth-child(1) > .info-value').textContent(); const results = { branch_name: branch_name, street_name: street_name, } const dataset = await Dataset.open('results-dataset'); await dataset.pushData(results); const csv = parse(results); writeFileSync('results.csv', csv); } else { // await page.waitForSelector('.more-details'); await enqueueLinks({ selector: '.more-details', label: 'DETAIL', }) } }, headless: true, }); await crawler.run(['https://www.example.com/]);

10 Replies

conscious-sapphire•3y ago

since you're running this request handler for every request, you create the same csv file for every request which is why it is being overwritten for solving this add another else block where you await for the dataset and finally store the results in the dataset as csv

absent-sapphireOP•3y ago

Many thanks for the answer bro. Any chance to get a snippet how I do that? Otherwise it might take me a few hours to figure it out. (I have little knowledge in Java script) Please? 🙂

sensitive-blue•3y ago

every time the requestHandler function runs, you create const csv = parse(results); and then write that to a file. However, the writeFileSync functions deletes the whole previous contents of the file before writing the new content, so only the last processed request ends up in the file. The easiest solution is probably to create a global array for the results before the crawler even starts, by adding

const allResults = [];
// add the above before this line that you already have
const crawler = new PlaywrightCrawler({ // ...

const allResults = [];
// add the above before this line that you already have
const crawler = new PlaywrightCrawler({ // ...

And then in the request handler, just allResults.push(results) instead of parsing it to CSV and writing the file. And then only write the result into the file after the whole crawl finishes:

await crawler.run(['...']) // you already have this
const csv = parse(allResults);
writeFileSync('results.csv', csv);

await crawler.run(['...']) // you already have this
const csv = parse(allResults);
writeFileSync('results.csv', csv);

absent-sapphireOP•3y ago

Thank you very much, I tried it now and I got this error, do you know why? throw new Error('Data should not be empty or the "fields" option should be included'); ^ Error: Data should not be empty or the "fields" option should be included

sensitive-blue•3y ago

Can you please copypaste or screenshot the whole error? It should include some backtrace of where it happened

absent-sapphireOP•3y ago

C:\Scrapers\Roladin\my-crawler\node_modules\json2csv\lib\JSON2CSVParser.js:57 throw new Error('Data should not be empty or the "fields" option should be included'); ^ Error: Data should not be empty or the "fields" option should be included at JSON2CSVParser.preprocessData (C:\Scrapers\Roladin\my-crawler\node_modules\json2csv\lib\JSON2CSVParser.js:57:13) at JSON2CSVParser.parse (C:\Scrapers\Roladin\my-crawler\node_modules\json2csv\lib\JSON2CSVParser.js:20:32) at module.exports.parse (C:\Scrapers\Roladin\my-crawler\node_modules\json2csv\lib\json2csv.js:15:65) at file:///C:/Scrapers/Roladin/my-crawler/src/main.js:227:13

MEE6•3y ago

@pelsec just advanced to level 5! Thanks for your contributions! 🎉

sensitive-blue•3y ago

Data should not be empty (...)

in the csv serialization - the allResults array is most likely empty due to some mistake. can you share the updated code?

absent-sapphireOP•3y ago

message.txt

absent-sapphireOP•3y ago

I tried to change from this const csv = parse(allResults); To this const csv = parse([allResults]); This time it created an empty file UP @mvolfik Ok solved thanks!

Gaming

Programming

Json2csv overwrites columns

Did you find this page helpful?