File download causes: waiting until "load" error

If the link on the website is <a href='page.html'>link</a> everything works fine but if it's <a href='image.png'>link</a> I get this error:
ERROR PlaywrightCrawler: Request failed and reached maximum retries. page.goto: net::ERR_ABORTED at https://mysite.com/?attachment_id=24365
=========================== logs ===========================
navigating to "https://mysite.com/?attachment_id=24365", waiting until "load"
============================================================
at gotoExtended (/path/to/crawler/node_modules/@crawlee/playwright/internals/utils/playwright-utils.js:153:17)
at PlaywrightCrawler._navigationHandler (/path/to/crawler/node_modules/@crawlee/playwright/internals/playwright-crawler.js:113:52)
at PlaywrightCrawler._handleNavigation (/path/to/crawler/node_modules/@crawlee/browser/internals/browser-crawler.js:285:51)
at async PlaywrightCrawler._runRequestHandler (/path/to/crawler/node_modules/@crawlee/browser/internals/browser-crawler.js:227:13)
at async PlaywrightCrawler._runRequestHandler (/path/to/crawler/node_modules/@crawlee/playwright/internals/playwright-crawler.js:110:9)
at async wrap (/path/to/crawler/node_modules/@apify/timeout/index.js:52:21) {"id":"KDiuQfhWBLWsv5N","url":"https://mysite.com/?attachment_id=24365","method":"GET","uniqueKey":"https://mysite.com?attachment_id=24365"}
INFO PlaywrightCrawler: Request https://mysite.com/?attachment_id=24365 failed too many times.
ERROR PlaywrightCrawler: Request failed and reached maximum retries. page.goto: net::ERR_ABORTED at https://mysite.com/?attachment_id=24365
=========================== logs ===========================
navigating to "https://mysite.com/?attachment_id=24365", waiting until "load"
============================================================
at gotoExtended (/path/to/crawler/node_modules/@crawlee/playwright/internals/utils/playwright-utils.js:153:17)
at PlaywrightCrawler._navigationHandler (/path/to/crawler/node_modules/@crawlee/playwright/internals/playwright-crawler.js:113:52)
at PlaywrightCrawler._handleNavigation (/path/to/crawler/node_modules/@crawlee/browser/internals/browser-crawler.js:285:51)
at async PlaywrightCrawler._runRequestHandler (/path/to/crawler/node_modules/@crawlee/browser/internals/browser-crawler.js:227:13)
at async PlaywrightCrawler._runRequestHandler (/path/to/crawler/node_modules/@crawlee/playwright/internals/playwright-crawler.js:110:9)
at async wrap (/path/to/crawler/node_modules/@apify/timeout/index.js:52:21) {"id":"KDiuQfhWBLWsv5N","url":"https://mysite.com/?attachment_id=24365","method":"GET","uniqueKey":"https://mysite.com?attachment_id=24365"}
INFO PlaywrightCrawler: Request https://mysite.com/?attachment_id=24365 failed too many times.
I've been trying to check the docs for a couple of days now but I have no idea what to do. I think this is related to the download event. I would like to skip all file downloads. Any guesses?
1 Reply
extended-salmon
extended-salmon3y ago
Hey there! This particular link leads to request being aborted. Therefore crawlee shows a valid error. I would say - the easiest way is to filter the requests that you enqueue - since you know which are valid requests and which are valid downloads.

Did you find this page helpful?