Website Content Crawler Execution Speed

Hello, in the website crawler content actor (Apify developed), it's often taking longer than 2 minutes to scrape only one page. I'm wondering if we change the input param "requestTimeoutSecs": 60 does that force the actor to go faster? What other levers do we have to control execution speed. I understand asynchronous runs for website crawls, but don't get why one page would take longer than 2 minutes to execute.
6 Replies
helpful-purple
helpful-purple•12mo ago
Check your inbox.
rare-sapphire
rare-sapphire•12mo ago
Hello, can you send me the run link into my private messages? I will take a look, as this is unusual behavior, a simple 2 page scrape should not take 2 minutes.
ratty-blush
ratty-blushOP•12mo ago
Don't see a way to dm you on discord. But here's the link anyway: https://console.apify.com/actors/tasks/d0h4jRkVTRHLNDsJG/runs/9AYQvFj6SmbBiuloZ Can you access it? Thanks for sharing your thoughts One page Max crawl depth: 0 Duration: 3m22s just accepted your connect request. feel free to dm.
rare-sapphire
rare-sapphire•12mo ago
I tried running the actor with the exact same input, and it seems to be finished after 18 seconds. Can you try running it again? It's possible there was some very temporary outage, or just weird Docker behavior in the background.
ratty-blush
ratty-blushOP•12mo ago
Yeah I can see how that could be a one-off due to a slow cold start or something. It's happened multiple times over multiple days 😦 I'm calling it from Make (with a 2 minute execution limit on the apify synchronous actor calling node. Do you know what the input "requestTimeoutSecs" is supposed to do and if it would have any impact? And "initialConcurrency" and "maxConcurrency"? (Probably useless for one page scrapes but double checking as I'm unsure exactlty how the implementation is behind the scenes)
rare-sapphire
rare-sapphire•12mo ago
requestTimeoutSecs is a field that describes how long the actor should wait on a request before it timeouts and tries again or fails. concurrency fields are used for scraping multiple pages at the same time. It's something you don't really have to worry about if you scrape only one page at a time.

Did you find this page helpful?