Scrape behind a login using the Puppeteer template
Hello, still new to scraping and Node.js, I need to login to a website to scrape the data I need. I am unsure if I should use the same code from the following documentation : https://docs.apify.com/tutorials/log-into-a-website-using-puppeteer#save-and-reuse-cookies, and where I should place the code , I assume in the main.js obviously but do I need to open a new browser?
Apify
Log into a website using Puppeteer · Apify Documentation
Learn how to complete a website's authentication process using headless Chrome and Puppeteer. Automate the filling in of log in details and passwords.
3 Replies
extended-salmon•3y ago
For regular web sites filling in login form and crawling with session enabled will work fine. For advanced sites like social networks where login from new device needs to be confirmed i.e. by 2FA or email code you need to save and reuse cookie. Try regular login with temp account if possible.
ambitious-aquaOP•3y ago
Thank you @Alexey Udovydchenko ( I fell asleep). Thank fully the regular login worked , I succeeded to log in with session enabled but I think my code is not complete I was logged out after extracting after a few request . I had a few warning log of this type:
2022-09-21T11:32:12.720Z WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. Navigation timed out after 60 seconds.
I suppose the cookies are short lived and I need to save them to transfer them to a new page.extended-salmon•3y ago
More like page is really heavy (really need more than 60sec to load) or proxy is slow, try to take page snapshot. If you logged out by server its usually reflected by page url, i.e. instead of targeted url crawler lands at
.com/login?next=...
or by http codes from responses, i.e. https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401