crawlee-js
apify-platform
crawlee-python
💻hire-freelancers
🚀actor-promotion
💫feature-request
💻devs-and-apify
🗣general-chat
🎁giveaways
programming-memes
🌐apify-announcements
🕷crawlee-announcements
👥community
Is recommend to use Crawlee without the Apify CLI?
How i can change/save the logger that the context provides
context.log
but i want to save/change the logger is used, because i want to save this, i am using the Crawly without Apify CLI
How can I add my own cookies to the crawler
SAME_HOSTNAME not working on non www URLs
EnqueueStrategy.SAME_HOSTNAME
I noticed it does not work properly on non www
urls.
In the debugger I noticed it passes origin
to the _check_enqueue_strategy
but it uses the context.request.loaded_url
if available.
So every URL that is checked will mismatch because of the difference in hostname
...
Testing my first actor
Chromium sandboxing failed
Not scheduling new tasks - system is overloaded - gcp cloud run
error
Enqueue_links only on match in url path? Cancel request in pre_navigation_hook?
Does Crawlee crawl both root-relative and base-relative urls?
Double log output
main.py
logging works as expected, however in routes.py
logging is printed twice for some reason.
I did not setup any custom logging, I just use
Actor.log.info("STARTING A NEW CRAWL JOB")
example:...clean way to stop "request queue seems to be stuck for 300.0"
WARN
, which spawns another playwright instance.
This probably happens since I only handle 1 request (I do not add anything to the RequestQueue), from which I just have a while until finished condition
is met.
```
[crawlee.storages._request_queue] WARN The request queue seems to be stuck for 300.0s, resetting internal state. ({"queue_head_ids_pending": 0, "in_progress": ["tEyKIytjmqjtRvA"]})...how to pass data to routes.py
Crawlee with multiple Crawlers?
router = Router[BeautifulSoupCrawlingContext]()
router = Router[BeautifulSoupCrawlingContext]()
Extracting a websites URLs, prioritizing URLs in the footer
ImportError: cannot import name 'service_container' from 'crawlee'
Parsel Crawler way too dank with request speed
Playwright Crawler on Windows?
Python Session Tracking
Issue with Residential Proxies