Pagination works locally in Crawlee but the same actor on Apify the pagination does not work correct
I have implemented pagination that can start from eg. page 2 and end at including page 5 to scrape all the data from each page. It works correctly on my local machine and I have pushed the newest working code (newest commit id) to GitHub and then to Apify via Webhook, however, when I run the actor on Apify.com it starts at the first page instead of page 2 and does not finish at including page 5. Any suggestions on what might be wrong?
6 Replies
graceful-beige•3y ago
You might try to use:
from your local repo and then test that build on apify platform, at least you can make sure your latest version is built. Or it might be an actor input issue, if you made it configurable.
from your local repo and then test that build on apify platform, at least you can make sure your latest version is built. Or it might be an actor input issue, if you made it configurable.
unwilling-turquoiseOP•3y ago
I have verified the commit id in the latest build that I run the scraper with is the latest commit it Github master branch so I would suspect this to be the issue and I can also see the latest change (test logging) was in the latest run as well. But I guess it never hurts to try it out. Input also seems to work, however locally it is a string and on apify it is a number, though this does not explain why the pagination still just cuts off at page 4 ?
graceful-beige•3y ago
Hmm, I can't generate more ideas without seeing at least something) I had an issue once that selector was missing when I run crawler on apify platform but it worked perfectly locally. Can't recall what was causing that.
ratty-blush•3y ago
Copy input from Apify cloud run to localhost kvstore, see if issue related to input parsing
Also make sure to run locally as
apify run -p
otherwise state data might interfere with scraping and cause side effectsprobable-pink•3y ago
Just screenshot it or post the code here - https://docs.apify.com/academy/node-js/analyzing-pages-and-fixing-errors#with-the-apify-sdk
How to analyze and fix errors when scraping a website | Apify Docum...
Learn how to deal with random crashes in your web-scraping and automation jobs. Find out the essentials of debugging and fixing problems in your crawlers.
unwilling-turquoiseOP•3y ago
yeah that could be an issue, however I change the url to navigate to the page I want, but then after I have started on the intended page I will find the next page button link
thanks I will look into that
in my actor I am specifying the input param startFromPageNumber as a string and parses the number to an int locally, would this fail on apify.com?
that would explain some issues then
or could using global variable cause problems?
I can see that the start url which use "url.searchParams.append("page", startFromPageNumber.toString()" does not add the page number to query
I suspect there is an issue with parsing this: "const startFromPageNumber: number = input.startFromPageNumber;"
it also seems even though I push actor using "npx apify-cli push" that the actor does not get updated because I can not see the console.log messages I have made locally
I have a trigger to execute new build when I add code to github main branch
I have solved the issue now. It was a bug in my implementation of query params. It would be really nice if such functionality would be added to crawlee and apify in the future to reduce the risk of bugs 🙂