Crawling Website Issues with n8n
Hi all! I am having trouble crawling websites (from Google Sheets) looking for specific keywords.
Not using a main url. Here is an example: https://www.whitehouse.gov/presidential-actions/executive-orders/
Looking for keywords like:
- trade
- freight
- foreign trade
- export
- import
- commerce
- sanctions
- customs
- tariff
- licensing
7 Replies
Hey! I'm not sure if I completely understand your use case, but would the /search endpoint help here? https://docs.firecrawl.dev/features/search
I want to crawl (https://www.whitehouse.gov/presidential-actions/executive-orders/)
I want the data to come back in these fields (on Google Sheets): Source Link Issuing Agency Title Date Keywords Document Type File Reference Full Text
To extract structured data during crawling, please check out JSON mode: https://docs.firecrawl.dev/features/llm-extract.
Oh I've been all over that. Unfortunately I keep getting this error: "URL must have a valid top-level domain or be a valid path"
Oh that error doesn't have anything to do with JSON mode. Can you share your request? Seems that the URL you passed in is not formatted correctly.
I can just seem to figure out how to crawl a website for specific info...
Pretty discouraged, but still fighting the good fight with Google
Hmm, sorry to hear that. Which website are you trying to crawl?