Crawl "Include Only Paths" not working?
I'm trying to scrape the products that exist on catalog page, and to do so I'm setting up a crawl where I set an Include Only Paths (includesPath), however the crawl only returns the original catalog URL.
The catalog page / main crawl URL: https://www.ssense.com/en-us/men/sale/clothing. Ex. Product Page 1: https://www.ssense.com/en-us/men/product/essentials/black-patch-hoodie/14616841 Ex. Product Page 2: https://www.ssense.com/en-us/men/product/auralee/brown-pleated-trousers/14085441 Include Only Paths: en-us/men/product/ In this case I expect to get back all 3 pages back, but I only get the catalog page. I've even tried Allowing backwards links. Is this a bug or am I missing something?
The catalog page / main crawl URL: https://www.ssense.com/en-us/men/sale/clothing. Ex. Product Page 1: https://www.ssense.com/en-us/men/product/essentials/black-patch-hoodie/14616841 Ex. Product Page 2: https://www.ssense.com/en-us/men/product/auralee/brown-pleated-trousers/14085441 Include Only Paths: en-us/men/product/ In this case I expect to get back all 3 pages back, but I only get the catalog page. I've even tried Allowing backwards links. Is this a bug or am I missing something?
3 Replies
Did you try
/en-us/men/product/*
(note the first forward slash and the last asterisk)?Hi yes, see attached screenshot

I think Max Depth may be the problem.