Mantisus
CACrawlee & Apify
•Created by Andrew on 5/12/2025 in #crawlee-python
Scrapping tweets are all mock tweets
Hi, @Andrew
Questions regarding a specific Actor should be asked on the Actor page. Since its developers may not be in the Discord community
https://console.apify.com/actors/CJdippxWmn9uRfooo/issues
4 replies
CACrawlee & Apify
•Created by optimistic-gold on 5/2/2025 in #crawlee-python
How to send an URL with a label to main file?
Thanks for your example, you can use
Request
with the from_url
constructor, for this:
6 replies
CACrawlee & Apify
•Created by extended-salmon on 5/2/2025 in #crawlee-python
How to send an URL with a label to main file?
Sorry, could you please give an example of the code. I don't quite understand your question
6 replies
CACrawlee & Apify
•Created by sunny-green on 8/14/2024 in #crawlee-python
How to save network requests made by the webpage I am scraping?
Hey @uandsaeed
For capture network traffic, you can use Playwright with the
record_har_path
parameter.8 replies
CACrawlee & Apify
•Created by exotic-emerald on 4/29/2025 in #crawlee-python
structlog support?
But
stderr
is also one of the standard outputs of a standard logger.
I haven't used structlog
, but given that it can wrap around the standard logger, I don't see any problems for that. Or any dirty tricks.5 replies
CACrawlee & Apify
•Created by frail-apricot on 4/29/2025 in #crawlee-python
structlog support?
Hey @Rykari
Crawlee, uses a standard logger, you can plug in
structlog
by following its official documentation - https://www.structlog.org/en/stable/standard-library.html
In the Crawlee documentation, there is an example of connecting loguru
which you can use as an example - https://crawlee.dev/python/docs/examples/configure-json-logging5 replies
CACrawlee & Apify
•Created by exotic-emerald on 4/23/2025 in #crawlee-python
Memory is critically overloaded
Hi @ROYOSTI
I recommend you post this situation in the repository, as it could be either another one of the customization features for AWS or a bug in the resource counter.
5 replies
CACrawlee & Apify
•Created by fascinating-indigo on 4/17/2025 in #crawlee-python
Routers not working as expected
default - same-hostname
However, the links to the PDF in your case are on a different host
5 replies
CACrawlee & Apify
•Created by conscious-sapphire on 4/17/2025 in #crawlee-python
Routers not working as expected
Hey @Matheus Rossi
Thank you for your interest in the framework!
Try using
5 replies
CACrawlee & Apify
•Created by conscious-sapphire on 4/15/2025 in #crawlee-python
Dynamically change dataset id based on root_domain
Hey @Rykari
Note the Dataset class - https://crawlee.dev/python/api/class/Dataset
You can open different Datasets in handlers and write data to them
5 replies
CACrawlee & Apify
•Created by quickest-silver on 4/9/2025 in #crawlee-python
Handling of 4xx and 5xx in default handler (Python)
Could you give examples of the kind of behavior you want to achieve?
Perhaps
error_handler
is better for your case
https://crawlee.dev/python/api/class/BasicCrawler#error_handler7 replies
CACrawlee & Apify
•Created by foreign-sapphire on 4/9/2025 in #crawlee-python
Handling of 4xx and 5xx in default handler (Python)
You need to include all. Something like.
7 replies
CACrawlee & Apify
•Created by ratty-blush on 4/9/2025 in #crawlee-python
Handling of 4xx and 5xx in default handler (Python)
Hey @rast42
Standard crawlee has its own behavior for status error handling
5xx - cause a repeat
403, 429, 401 - cause session rotation if used
4xx - marked as erroneous without repetition
If you want to handle any statuses yourself you can use
ignore_http_error_status_codes
.7 replies
CACrawlee & Apify
•Created by generous-apricot on 4/6/2025 in #crawlee-python
Camoufox and adaptive playwright
The
brower_pool
is set with playwright_crawler_specific_kwargs
, but I don't have a way to test running it with Camoufox
right now. However, if it is not supported. it is an error
7 replies
CACrawlee & Apify
•Created by equal-aqua on 4/6/2025 in #crawlee-python
Camoufox and adaptive playwright
Hey, @Doigus
Could you create an Issue, with an example of the error you're getting and more context? https://github.com/apify/crawlee-python/issues
7 replies
CACrawlee & Apify
•Created by sensitive-blue on 3/22/2025 in #crawlee-python
Proxy example with PlaywrightCrawler
Hey
The documentation has examples of using
PlaywrightCrawler
with proxy - https://crawlee.dev/python/docs/guides/proxy-management#crawler-integration.
Try changing the proxies you are using. Judging by your error I think your proxies have some kind of certificate conflict with Instagram3 replies
CACrawlee & Apify
•Created by stormy-gold on 3/13/2025 in #apify-platform
Uncaught exception during the run of the Actor
Hey @Arindam
A new release was made today that should fix this.
Try set crawlee==0.6.5
https://github.com/apify/crawlee-python/releases/tag/v0.6.5
5 replies
CACrawlee & Apify
•Created by automatic-azure on 3/7/2025 in #crawlee-python
Selenium + Chrome Instagram Scraper cannot find the Search button when I run it in Apfiy..
Hey.
In such a context, I would recommend starting with a debug using screenshots when running on the Apify platform and saving them to key-value storage. That way you can get a better understanding of what the problem is. Also test the crawler locally with the same proxy configuration
Any crawler may work differently locally and in the cloud, for example because of proxy (if you didn't use proxy locally). For example Youtube doesn't show some popup windows for me because Ukraine doesn't have GDPR. However, when crawler works with European proxies these popups will.
4 replies
CACrawlee & Apify
•Created by vicious-gold on 3/5/2025 in #crawlee-python
Error on cleanup PlaywrightCrawler
You can try using - use_incognito_pages=True, maybe it will improve the situation with zombie processes (But will reduce the speed of your crawler as there will be no brawser cache sharing between different requests)
But I am not sure, because if it is not related to crash due to file closing error, we need to study the situation in detail.
10 replies
CACrawlee & Apify
•Created by conscious-sapphire on 3/5/2025 in #crawlee-python
Error on cleanup PlaywrightCrawler
Got it. Yes, please report in the Issue repository.
10 replies