Crawl endpoint missing a page that is both in sitemap and linked
We're crawling a site with allowSubdomains set to true and sitemap set to "include" in the request, but still noticing many pages aren't getting pulled. In the past, if we didn't include sitemap, this would be because a page is orphaned so the recursive crawl can't grab it because nothing links to it. However, with sitemap included, shouldn't this be solved?
1 Reply
So the /map endpoint is quite cheap and fast, however, it doesn't guarantee that all of the webpages will be found or that they are all still valid, since we do not actually do a full crawl of the website. Instead, you could consider using the /crawl endpoint for this use case. You can check out the documentation for it here.
Sorry I realized I misspoke above. You're using /crawl not /map. Can you please share your raw request?