So what do you guys suggest in this
So what do you guys suggest in this scenario? Because I would like to continue using firecrawl but it cant crawl some domains then thats very bad for me
6 Replies
you can try allowing backwards links
but I dont want backward linked pages
just to clarify what you need, do you need to crawl through all the sublinks of https://docs.tokenterminal.com/ ?
Not sure why but for this link, if you allow backward links, it looks like it gets all the pages it misses from just crawling
this would work if your need is to get all the sublinks of the website
allowing backward links wont follow links to external websites, just to clarify
so what should be the general algorithm?
Because according to firecrawl's documentation, backlink is only if we want to scrape a url which is not a sublink of a url
but in this case all are sublinks
I sometimes need to ignore backlinks as well, for example for https://medium.com/etherfi I dont want to get all possible medium.com urls
only etherfi posts
yeah for tokenterminal docs, allowing backlink works
but for https://medium.com/etherfi, I can't seem to make it work
im not a dev or too technical, but let me look at the src code to see if i can understand it and give u an answer on that cuz im not sure either