Help with www subdomain
How I allow www. subdomains in "same-hostname"? Some links are prefixed with www. but point to the same website.
I think www subdomains should be considered on "same-hostname" on enqueuelinks
www.xyz.com is always be the same as xyz.com and vice-versa.
3 Replies
other-emerald•3y ago
See enqueueLinks strategy option.
same-domain
: Matches any URLs that have the same (sub-)domain as the base URL. For example, https://wow.an.example.com and https://example.com will both be matched for a base url of https://example.com.
Example: Filtering links to same domain [1]
[1] https://crawlee.dev/docs/introduction/adding-urls#filtering-links-to-same-domainAdding more URLs | Crawlee
Your first steps into the world of scraping with Crawlee
sunny-greenOP•3y ago
the problem with 'same-domain' is that it pulls all subdomains and it's not my goal. Main site only.
I solved it with some link filters with cheerio and some regex
xenial-black•3y ago
Good point, I will raise issue to use it as same hostname