Crawlee & Apify•10mo ago

proxy not working today?

Yesterday I was getting around my JavaScript skill issues by building a simple HTTP crawler on Elixir using the HTTPoison module. It worked. Today I try the same code again and I’m getting an error. I tried different groups and no juice.

4 Replies

xenial-blackOP•10mo ago

Any ideas on how I could troubleshoot? I checked if there are any issues with my proxy token on the console and no issues, enough credit.

flat-fuchsia•10mo ago

Well, can you share the code base?

xenial-blackOP•10mo ago

Thanks for replying. The problem seemed to be due to the fact that I was using Task.async_stream which seems to be hammering the proxy endpoint all at once. Here's the simplified version of I was using before

def crawler do
  Apps.list_apps()
  |> Task.async_stream(&update_app/1)
end

def update_app(app) do
  url = app.url
  case HTTPoison.get(url, [],
           timeout: 10_000,
           recv_timeout: 10_000,
           follow_redirect: true,
           proxy: {"proxy.apify.com", 8000},
           proxy_auth: {"groups-RESIDENTIAL", @apify_proxy}
         ) do
  ....
  # handles a bunch of errors
end

def crawler do
  Apps.list_apps()
  |> Task.async_stream(&update_app/1)
end

def update_app(app) do
  url = app.url
  case HTTPoison.get(url, [],
           timeout: 10_000,
           recv_timeout: 10_000,
           follow_redirect: true,
           proxy: {"proxy.apify.com", 8000},
           proxy_auth: {"groups-RESIDENTIAL", @apify_proxy}
         ) do
  ....
  # handles a bunch of errors
end

Here's the error I was getting:

[error] proxy error: "HTTP/1.1 590 UPSTREAM400\r\nConnection: close\r\nDate: Thu, 08 Aug 2024 14:08:48 GMT\r\nContent-Length: 0\r\n\r\n"

[error] proxy error: "HTTP/1.1 590 UPSTREAM400\r\nConnection: close\r\nDate: Thu, 08 Aug 2024 14:08:48 GMT\r\nContent-Length: 0\r\n\r\n"

I'm going to try rebuilding the crawler with Crawly which seems to be a port of Crawlee to Elixir. It might have something to do with your reply to my other thread <#1270567482398081084>

GitHub

GitHub - elixir-crawly/crawly: Crawly, a high-level web crawling & ...

Crawly, a high-level web crawling & scraping framework for Elixir. - GitHub - elixir-crawly/crawly: Crawly, a high-level web crawling & scraping framework for Elixir.

MEE6•10mo ago

@rico just advanced to level 1! Thanks for your contributions! 🎉

Gaming

Programming

proxy not working today?

Did you find this page helpful?