Cannot fully crawl https://ordwaylabs.stoplight.io/
Using the map endpoint on the playground returns only 9 results (despite first showing a "This job contains over 500 documents" warning). I also tried this website from FireCrawl's Python API and it only returns 4 results. I'm expecting many, many more.
13 Replies
@Caleb @Adobe.Flash this is fairly time-sensitive on my end, so please let me know if I need to try an alternative crawling service for this one if you don't think it's a quick fix
What is the site!
We will look into it asap
Ah see it in the title
@rafaelmiller can you create a ticket?
taking a look right now
Thanks @rafaelmiller, I really appreciate it! With the crawl endpoint, I'm possibly getting more success by adding a wait action
yeah. I got 19 results with wait 5000
checking with other parameters now
Same, but there's a lot more pages than that
I also have maxDepth set to 10 (instead of the default 2)
I got it why firecrawl is not finding the links. The links inside the docs page (naviagtion bar) are loaded only when you click on it
checking with our scraping engineer if there's an option for clicking on every option on the navbar so the crawler can see the links it has to follow
Okay thanks for the update!
@rafaelmiller do you have any further updates or should I expect an update on Monday? Thanks!
Hey @micah.stairs sorry for the delay in getting back to you. To resolve this issue, we’ll need to implement a "click all" feature within actions. I’ve added a GitHub issue for prioritization: https://github.com/mendableai/firecrawl/issues/854.
GitHub
Issues · mendableai/firecrawl
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API. - Issues · mendableai/firecrawl
Okay! And to clarify, does the action get performed before FireCrawl looks links to traverse as part of the crawl? I was under the impression it just affected what data was scraped from that page
Yes, actions are performed before Firecrawl retrieves the links on a page. This means any elements clicked or interacted with during the action phase can impact which links Firecrawl detects and traverses.
Okay good to know! Is that properly communicated in the documentation? I don't remember seeing anything about that
I'm not sure either. I'll add it to make the behavior clearer