Incomplete scraping
I'm trying to capture links from a page's table of contents. I can use pyppeteer with chromium and it returns the entire page contents, but when I use the scrape and crawl, it doesn't. I've tried both the firecrawl API and the self-hosted, neither one provides a complete response.
it doesn't seem like playwright is even being used, but the logging doesn't work very well once everything is running, so I'm having a hard time troubleshooting. I know my playwright service is up, but it doesn't even seem to be used. I have the waitFor page option enabled in my scrape query, but it still doesn't seem to use playwright.
https://docs.venafi.com/Docs/currentSDK/TopNav/Content/SDK/WebSDK/r-SDK-CertificatesModuleProgramming-Interfaces.php?tocpath=Web%20SDK%20REST%7CCertificate%20endpoints%20for%20TLS%7CCertificates%20API%7C_____0
is the url. Thanks in advance!
1 Reply
https://www.firecrawl.dev/playground?url=https%3A%2F%2Fdocs.venafi.com%2FDocs%2FcurrentSDK%2FTopNav%2FContent%2FSDK%2FWebSDK%2Fr-SDK-CertificatesModuleProgramming-Interfaces.php%3Ftocpath%3DWeb%2520SDK%2520REST%257CCertificate%2520endpoints%2520for%2520TLS%257CCertificates%2520API%257C_____0&mode=scrape&limit=10&excludes=&includes=&returnOnlyUrls=&ignoreSitemap=&maxDepth=&onlyMainContent=false&includeHtml=false&removeTags=&onlyIncludeTags=&waitFor=4000
Hey! I added a waitfor parameter of 4000 and it seemed to get all the content. Definitely running playwright in the background