SAME_HOSTNAME not working on non www URLs

When using the EnqueueStrategy.SAME_HOSTNAME I noticed it does not work properly on non www urls. In the debugger I noticed it passes origin to the _check_enqueue_strategy but it uses the context.request.loaded_url if available. So every URL that is checked will mismatch because of the difference in hostname I tested this with multiple urls with & without www prefix and got the same behaviour.
No description
No description
2 Replies
Hall
Hall4mo ago
Someone will reply to you shortly. In the meantime, this might help: -# This post was marked as solved by ROYOSTI. View answer.
Mantisus
Mantisus4mo ago
Hi @ROYOSTI Feel free to create an issue about this bug ) https://github.com/apify/crawlee-python/issues
GitHub
Issues · apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Wo...

Did you find this page helpful?