Crawlee & Apify•3y ago

Scrape behind a login using the Puppeteer template

Hello, still new to scraping and Node.js, I need to login to a website to scrape the data I need. I am unsure if I should use the same code from the following documentation : https://docs.apify.com/tutorials/log-into-a-website-using-puppeteer#save-and-reuse-cookies, and where I should place the code , I assume in the main.js obviously but do I need to open a new browser?

Apify

Log into a website using Puppeteer · Apify Documentation

Learn how to complete a website's authentication process using headless Chrome and Puppeteer. Automate the filling in of log in details and passwords.

3 Replies

extended-salmon•3y ago

For regular web sites filling in login form and crawling with session enabled will work fine. For advanced sites like social networks where login from new device needs to be confirmed i.e. by 2FA or email code you need to save and reuse cookie. Try regular login with temp account if possible.

ambitious-aquaOP•3y ago

Thank you @Alexey Udovydchenko ( I fell asleep). Thank fully the regular login worked , I succeeded to log in with session enabled but I think my code is not complete I was logged out after extracting after a few request . I had a few warning log of this type:

2022-09-21T11:32:12.720Z WARN  PuppeteerCrawler: Reclaiming failed request back to the list or queue. Navigation timed out after 60 seconds.

I suppose the cookies are short lived and I need to save them to transfer them to a new page.

extended-salmon•3y ago

More like page is really heavy (really need more than 60sec to load) or proxy is slow, try to take page snapshot. If you logged out by server its usually reflected by page url, i.e. instead of targeted url crawler lands at .com/login?next=... or by http codes from responses, i.e. https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401

Gaming

Programming

Scrape behind a login using the Puppeteer template

Did you find this page helpful?