Geonode Proxies

Hey, having some troubles trying to use the proxies provided by Geonode

const proxyConfiguration = new ProxyConfiguration({
  proxyUrls: [
    "http://{username}:{password}@rotating-residential.geonode.com:9010"
  ],
});

logger.info("Setting up crawler.");
const crawler = new PlaywrightCrawler({
  proxyConfiguration,

This is the error I'm getting when Crawlee tries to enqueue the first url fed to it Error in default url (array urls) Expected property values to be of type string but received type null in object options When try to run without proxies everything works fine. Also the username and password variables are replaced with proper data

5 Replies

fascinating-indigo•3y ago

Are you running it on the platform? Can you share the link, please? Otherwise, please provide a screenshot of the log with the full error description.

crude-lavenderOP•3y ago

Could you explain what running on the platform would mean? This is the complete log: error: Error in default url (array urls) Expected property values to be of type string but received type null in object options {"crawlUUID":"b7fdb85d-26a6-46f2-9c0d-d3874426b459","name":"ArgumentError","stack":"ArgumentError: (array urls) Expected property values to be of type string but received type null in object options

\n    at ow (/Users//Desktop/REBI-scraper/node_modules/ow/dist/index.js:33:28)\n    at enqueueLinks (/Users//Desktop/scraper/node_modules/@crawlee/core/enqueue_links/enqueue_links.js:93:22)\n    at browserCrawlerEnqueueLinks (/Users/Desktop/REBI-scraper/node_modules/@crawlee/browser/internals/browser-crawler.js:409:37)\n    at runNextTicks (node:internal/process/task_queues:60:5)\n    at process.processImmediate (node:internal/timers:442:9)\n    at process.callbackTrampoline (node:internal/async_hooks:130:17)\n    at async Object.enqueue (file:///Users//Desktop/REBI-scraper/dist/crawler/sites/alo/scrape.js:15:9)\n    at async file:///Users//Desktop/scraper/dist/crawler/routes.js:51:13\n    at async wrap (/Users//Desktop/REBI-scraper/node_modules/@apify/timeout/index.js:52:21)","timestamp":"2023-04-14T14:35:59.812Z","validationErrors":{}}

Also, it is important to mention

await page.waitForSelector(
      ".PaginationContainer--bottom .Pagination-item--next > .Pagination-link"
    );
    const nextButton = page.locator(
      ".PaginationContainer--bottom .Pagination-item--next > .Pagination-link"
    );
    const nextHref = await nextButton.getAttribute("data-href");
    console.log("href", nextButton, nextHref);
    await enqueueLinks({
      // CONSIDER REPLACING URLS WITH SELECTOR
      urls: [nextHref],
      label: "LIST",
      // limit: 6,
    });

This is the element I'm extracting the hrefs from and the value of nextHref is null if I try to approach it with proxy. In contrast, the value of nextHref variable is valid when not using proxy What weirds me out is the fact when I run the crawler with headfull browser I can see the first page actually opening and all the elements on it clearly visible but the href still doesn't get found

fascinating-indigo•3y ago

Error is not about proxy. it's about adding links to the queue: Expected property values to be of type string but received type null in object options\n at ow (/Users//Desktop/REBI-scraper/node_modules/ow/dist/index.js:33:28)\n at enqueueLinks Possibly, the website blocks your proxy (and you need to test another proxy/proxy group). Try make a screenshot of the loaded page and check what you have there. Also, try to use Playwright + firefox. It can help with blocks.

crude-lavenderOP•3y ago

But as previously mentioned - the headfull browser opens and loads the page This specific page I'm targeting asks for captcha when it suspects a harmful behaviour and it does not happen in this case

fascinating-indigo•3y ago

headful mode also can have impact on the block possibility. If it works with, just keep using it. if the href still doesn't get found > re-check your selectors. Maybe check it even in the window of Puppeteer. Maybe it has different selector. Just add some sleep() to have some time to check the page.

Gaming

Programming

Geonode Proxies

Did you find this page helpful?