Error on cleanup PlaywrightCrawler
I use PlaywrightCrawler with
headless=True
The package that I use is: crawlee[playwright]==0.6.1
When running the crawler I noticed when waiting for remaining tasks to finish it sometimes receives an error like you can see in the screenshot. Is this something that can be resolved easily?
Because I think this error is also related to another issue I have.
In my code I have my own batching system in place. But I noticed that my memory slowly started to increase on each batch.
After some investigation I saw that ps -fC headless_shell
gave me a lot headless_shell with <defunct>
(zombie processes). So I assume this is related to the cleanup that is failing on each crawl.
Below you can see my code for the batching system:
7 Replies
@ROYOSTI just advanced to level 1! Thanks for your contributions! 🎉
Someone will reply to you shortly. In the meantime, this might help:
-# This post was marked as solved by ROYOSTI. View answer.
generous-apricotOP•3mo ago
🤦♂️forgot to upload the screenshot

generous-apricotOP•3mo ago
UPDATE:
Noticed this PR: https://github.com/apify/crawlee-python/pull/1046
This will fix my initial issue. Hopefully will this also fix the zombie processes on each batch 🙏
GitHub
fix: Remove tmp folder for PlaywrightCrawler in non-headless mode b...
Description
Fix a bug removing a temporary folder for non-headless mode. I saved an attempt to remove a folder on the close event, for when for some reason the browser crashes and the context is c...
correct-apricot•3mo ago
Yes, unfortunately this bug did not show up in tests during development. And I only discovered it while testing the release on one of my projects 😢
I think this should help with zombie processes, as the error during file closing prevents the browser closing to complete correctly. But if after the PR release, it persists, feel free to create an Issue in the repository.
@ROYOSTI This should already be available in the beta release
crawlee==0.6.3b3
.
If you decide to try this, please let me know if you observe any problemsgenerous-apricotOP•3mo ago
@Mantisus, I did a small rerun and used
crawlee==0.6.3b4
. The issue for removing the tmp folder for PlaywrightCrawler is solved.
But on each batch it still keeps a lot of zombie processes.
Could I fix something in my code to prevent this? Or is this something that I best report in an Issue in the repository?correct-apricot•3mo ago
Got it. Yes, please report in the Issue repository.
You can try using - use_incognito_pages=True, maybe it will improve the situation with zombie processes (But will reduce the speed of your crawler as there will be no brawser cache sharing between different requests)
But I am not sure, because if it is not related to crash due to file closing error, we need to study the situation in detail.