web scrapper create 1 file instead of multiple output for pagination

I used this article https://docs.apify.com/academy/advanced-web-scraping/scraping-paginated-sites#define-and-enqueue-pivot-ranges to scrap data of multiple pages, when I run apify run I get 20 different json files. How can I combine all the data in 1 json file for all the pages?
Scraping paginated sites | Apify Documentation
Learn how to extract all of a website's listings even if they limit the number of results pages. See code examples for setting up your scraper.
7 Replies
xenophobic-harlequin
xenophobic-harlequinOP•2y ago
also very similar to this : https://docs.apify.com/sdk/js/docs/examples/add-data-to-dataset but instead of having individual files, how can I have one file? @Andrey Bykov can you please help me with this? I can't find a solution and searched every where
frail-apricot
frail-apricot•2y ago
One call to Dataset.pushData() or Actor.pushData() produces one JSON. I believe what's you're looking is https://crawlee.dev/api/core/class/Dataset#exportToJSON
xenophobic-harlequin
xenophobic-harlequinOP•2y ago
yes, I tried using await Dataset.pushData to create a single large file , in this page "https://crawlee.dev/docs/introduction/saving-data#whats-datasetpushdata" it says "If you would like to store your data in a single big file, instead of many small ones, see the Result storage guide for Key-value stores." I want all the data from the 20 pages to be present in 1 file if I use await Dataset.exportToJSON('results'); it just combines the json files, I want the data to be under data key right now it is like this:
{
"url": "example.com/page/1",
"data": [
],
},
{
"url": "example.com/page/2",
"data": [
],
}
]
{
"url": "example.com/page/1",
"data": [
],
},
{
"url": "example.com/page/2",
"data": [
],
}
]
I want to have
{
"data": [
// data for all pages
],
}
]
{
"data": [
// data for all pages
],
}
]
frail-apricot
frail-apricot•2y ago
btw just realised this is crawlee forum. Apify-related question should be on separate forum. As for your question - in web scraper there's no easy way do it, except using globalStore I guess, see there: https://apify.com/apify/web-scraper#globalstore-object
Apify
Web Scraper · Apify
Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.
xenophobic-harlequin
xenophobic-harlequinOP•2y ago
sorry for posting to the wrong forum, I didnt notice that. What about this reduce? https://crawlee.dev/api/core/class/Dataset#reduce
MEE6
MEE6•2y ago
@generator101 just advanced to level 2! Thanks for your contributions! 🎉
frail-apricot
frail-apricot•2y ago
web scraper does not have full access to crawlee - only certain methods are exposed

Did you find this page helpful?