Crawl does not work for a specific website, but scrape does

I’m trying to crawl this URL: https://www.investopedia.com/terms/r/request-for-proposal.asp/ Yes, including the trailing /. The crawl endpoint is returning nothing, while the scrape endpoint retrieves the correct information. Shouldn't both work?
Investopedia
RFP: What a Request for Proposal Is, Requirements, and a Sample
A request for proposal (RFP) is a project funding announcement posted by a business or organization that invites companies to place bids to complete the project.
7 Replies
micah.stairs
micah.stairs2d ago
In order to crawl the links on that starting URL, you need to set crawlEntireDomain to true. If it's not desirable for your use case to crawl the entire domain then you can alternatively pick a different starting URL (with less slashes).
Lucas Mendonça
Lucas MendonçaOP2d ago
I don’t want to crawl the entire website — only this specific endpoint and its children (if any). This specific sample should return a single result, just like the website itself, but it’s currently returning empty and not even scraping that page
micah.stairs
micah.stairs2d ago
Ah I see. Yeah, that does seem to be undesirable behavior. As a workaround, can you just crawl www.investopedia.com/terms/r/request-for-proposal.asp? That returns one result as expected.
Lucas Mendonça
Lucas MendonçaOP2d ago
Not exactly — I added the trailing / to handle several edge cases, so now the standard for all my URLs is that every one of them ends with /
Lucas Mendonça
Lucas MendonçaOP2d ago
I am running a few tests and its relly arbitrary from whether it works or not, look at these samples: https://nicocarlier.github.io/info/ -> this will get the current page https://www.inventive.ai/blog/ -> this won't retrieve anything
Nico Carlier
Personal website and portfolio of Nico Carlier
Inventive Blog
Check out Inventive blogs to learn more about sales, GTM and proposal management.
micah.stairs
micah.stairs2d ago
Is there any reason you can't just strip off the slash before passing into Firecrawl? That seems like the easiest short-term solution.
Lucas Mendonça
Lucas MendonçaOP2d ago
I’m not sure about the URL example right now. However, the Firecrawl API was adding a trailing / to the URL—even on the sourceURL. Since I query my database by URL, it wasn’t finding any matching entries

Did you find this page helpful?