Crawlee & Apify•7mo ago

autoscale pool trying to scale up without suffecient memory

Hi All, im running a playwright crawler and am running into a bit of an issue with crawler stability. Have a look at these two log messages

{
  "service": "AutoscaledPool",
  "time": "2024-10-30T16:42:17.049Z",
  "id": "cae4950d568a4b8bac375ffa5a40333c",
  "jobId": "9afee408-42bf-4194-b17c-9864db707e5c",
  "currentConcurrency": "4",
  "desiredConcurrency": "5",
  "systemStatus": "{\"isSystemIdle\":true,\"memInfo\":{\"isOverloaded\":false,\"limitRatio\":0.2,\"actualRatio\":0},\"eventLoopInfo\":{\"isOverloaded\":false,\"limitRatio\":0.6,\"actualRatio\":0},\"cpuInfo\":{\"isOverloaded\":false,\"limitRatio\":0.4,\"actualRatio\":0},\"clientInfo\":{\"isOverloaded\":false,\"limitRatio\":0.3,\"actualRatio\":0}}"
}

{
  "service": "AutoscaledPool",
  "time": "2024-10-30T16:42:17.049Z",
  "id": "cae4950d568a4b8bac375ffa5a40333c",
  "jobId": "9afee408-42bf-4194-b17c-9864db707e5c",
  "currentConcurrency": "4",
  "desiredConcurrency": "5",
  "systemStatus": "{\"isSystemIdle\":true,\"memInfo\":{\"isOverloaded\":false,\"limitRatio\":0.2,\"actualRatio\":0},\"eventLoopInfo\":{\"isOverloaded\":false,\"limitRatio\":0.6,\"actualRatio\":0},\"cpuInfo\":{\"isOverloaded\":false,\"limitRatio\":0.4,\"actualRatio\":0},\"clientInfo\":{\"isOverloaded\":false,\"limitRatio\":0.3,\"actualRatio\":0}}"
}

autoscaled pool is trying to increase its concurrency from 4 to 5 since it was in its view idle. 20 seconds later though

{
  "rejection": "true",
  "date": "Wed Oct 30 2024 16:42:38 GMT+0000 (Coordinated Universal Time)",
  "process": "{\"pid\":1,\"uid\":997,\"gid\":997,\"cwd\":\"/home/myuser\",\"execPath\":\"/usr/local/bin/node\",\"version\":\"v22.9.0\",\"argv\":[\"/usr/local/bin/node\",\"/home/myuser/FIDO-Scraper-Discovery\"],\"memoryUsage\":{\"rss\":337043456,\"heapTotal\":204886016,\"heapUsed\":168177928,\"external\":30148440,\"arrayBuffers\":14949780}}",
  "os": "{\"loadavg\":[3.08,3.38,3.68],\"uptime\":312222.44}",
  "stack": "response.headerValue: Target page, context or browser has been closed\n    at Page.<anonymous> (/home/myuser/FIDO-Scraper-Discovery/dist/articleImagesPreNavHook.js:15:60)"
}

{
  "rejection": "true",
  "date": "Wed Oct 30 2024 16:42:38 GMT+0000 (Coordinated Universal Time)",
  "process": "{\"pid\":1,\"uid\":997,\"gid\":997,\"cwd\":\"/home/myuser\",\"execPath\":\"/usr/local/bin/node\",\"version\":\"v22.9.0\",\"argv\":[\"/usr/local/bin/node\",\"/home/myuser/FIDO-Scraper-Discovery\"],\"memoryUsage\":{\"rss\":337043456,\"heapTotal\":204886016,\"heapUsed\":168177928,\"external\":30148440,\"arrayBuffers\":14949780}}",
  "os": "{\"loadavg\":[3.08,3.38,3.68],\"uptime\":312222.44}",
  "stack": "response.headerValue: Target page, context or browser has been closed\n    at Page.<anonymous> (/home/myuser/FIDO-Scraper-Discovery/dist/articleImagesPreNavHook.js:15:60)"
}

which suggests memory was much tighter than autoscaledpool was considering, likley due to the additional ram that chromium was using. Crawlee was running in a k8 pod with a 4GB ram limit. Is this behaviour intended and how might i improve my performance? Does autoscaled pool account for how much ram is actually in use or just how much the node process uses?

6 Replies

Hall•7mo ago

View post on community site

This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.

Apify Community

xenial-blackOP•7mo ago

heres a log export from my service. after this the pod autorestarts due to the memory limit

log_export.xlsx

quickest-silver•7mo ago

The AutoscaledPool doesn't ensure the memory never goes above the limit, it just doesn't scale to more requests if it is close. So if there is a sudden memory spike, like on very heavy page, it can still cause troubles. You can either limit maxConcurrency or play with the autoscaledPoolOptions to reduce memory scaling.

xenial-blackOP•7mo ago

but it seems to me that the pool was still trying to scale up, even while there was no extra memory to be had?

Pepa J•7mo ago

Hi @Crafty if the defaults settings doesn't work for you may adjust the ratios for scaling up by https://crawlee.dev/api/core/interface/AutoscaledPoolOptions in the Crawler options.

xenial-blackOP•7mo ago

Thanks for these, eventually i found the snapshotter used memory ratio and turned it down. 🙂

Gaming

Programming

autoscale pool trying to scale up without suffecient memory

Did you find this page helpful?