Geonode Proxies

Hey, having some troubles trying to use the proxies provided by Geonode const proxyConfiguration = new ProxyConfiguration({ proxyUrls: [ "http://{username}:{password}@rotating-residential.geonode.com:9010" ], }); logger.info("Setting up crawler."); const crawler = new PlaywrightCrawler({ proxyConfiguration, This is the error I'm getting when Crawlee tries to enqueue the first url fed to it Error in default url (array urls) Expected property values to be of type string but received type null in object options When try to run without proxies everything works fine. Also the username and password variables are replaced with proper data
5 Replies
fascinating-indigo
fascinating-indigo3y ago
Are you running it on the platform? Can you share the link, please? Otherwise, please provide a screenshot of the log with the full error description.
crude-lavender
crude-lavenderOP3y ago
Could you explain what running on the platform would mean? This is the complete log: error: Error in default url (array urls) Expected property values to be of type string but received type null in object options {"crawlUUID":"b7fdb85d-26a6-46f2-9c0d-d3874426b459","name":"ArgumentError","stack":"ArgumentError: (array urls) Expected property values to be of type string but received type null in object options\n at ow (/Users//Desktop/REBI-scraper/node_modules/ow/dist/index.js:33:28)\n at enqueueLinks (/Users//Desktop/scraper/node_modules/@crawlee/core/enqueue_links/enqueue_links.js:93:22)\n at browserCrawlerEnqueueLinks (/Users/Desktop/REBI-scraper/node_modules/@crawlee/browser/internals/browser-crawler.js:409:37)\n at runNextTicks (node:internal/process/task_queues:60:5)\n at process.processImmediate (node:internal/timers:442:9)\n at process.callbackTrampoline (node:internal/async_hooks:130:17)\n at async Object.enqueue (file:///Users//Desktop/REBI-scraper/dist/crawler/sites/alo/scrape.js:15:9)\n at async file:///Users//Desktop/scraper/dist/crawler/routes.js:51:13\n at async wrap (/Users//Desktop/REBI-scraper/node_modules/@apify/timeout/index.js:52:21)","timestamp":"2023-04-14T14:35:59.812Z","validationErrors":{}} Also, it is important to mention await page.waitForSelector( ".PaginationContainer--bottom .Pagination-item--next > .Pagination-link" ); const nextButton = page.locator( ".PaginationContainer--bottom .Pagination-item--next > .Pagination-link" ); const nextHref = await nextButton.getAttribute("data-href"); console.log("href", nextButton, nextHref); await enqueueLinks({ // CONSIDER REPLACING URLS WITH SELECTOR urls: [nextHref], label: "LIST", // limit: 6, }); This is the element I'm extracting the hrefs from and the value of nextHref is null if I try to approach it with proxy. In contrast, the value of nextHref variable is valid when not using proxy What weirds me out is the fact when I run the crawler with headfull browser I can see the first page actually opening and all the elements on it clearly visible but the href still doesn't get found
fascinating-indigo
fascinating-indigo3y ago
Error is not about proxy. it's about adding links to the queue: Expected property values to be of type string but received type null in object options\n at ow (/Users//Desktop/REBI-scraper/node_modules/ow/dist/index.js:33:28)\n at enqueueLinks Possibly, the website blocks your proxy (and you need to test another proxy/proxy group). Try make a screenshot of the loaded page and check what you have there. Also, try to use Playwright + firefox. It can help with blocks.
crude-lavender
crude-lavenderOP3y ago
But as previously mentioned - the headfull browser opens and loads the page This specific page I'm targeting asks for captcha when it suspects a harmful behaviour and it does not happen in this case
fascinating-indigo
fascinating-indigo3y ago
headful mode also can have impact on the block possibility. If it works with, just keep using it. if the href still doesn't get found > re-check your selectors. Maybe check it even in the window of Puppeteer. Maybe it has different selector. Just add some sleep() to have some time to check the page.

Did you find this page helpful?