F
Firecrawl14mo ago
Thaslu

help with scraping inside data

while i used firecrawl to scrape data from a job site it only scraped data from the initial page. but the actual data is present inside the job title link i wanted to extract that data too how can i achievev it? ...here is a sample screenshot of the page
No description
9 Replies
Adobe.Flash
Adobe.Flash14mo ago
Hey @Thaslu , might be wise to try an crawl with allowBackwardLinks option set to true. That's because is very likely that the job pages might not be children (via url) of the page you are starting the crawl on.
Thaslu
ThasluOP14mo ago
It still dont working. any help?
Adobe.Flash
Adobe.Flash14mo ago
Ccing @thomas here to take a deeper look @Thaslu can you share the url with us too?
Adobe.Flash
Adobe.Flash14mo ago
Thanks @Thaslu ! Forwarded that to our web engineer to see whats going on
thomas
thomas14mo ago
Hey @Thaslu scrape will only scrape the data that is visible in the page,If I understand correctly that you need the content of all the links you probably need crawl.
Thaslu
ThasluOP14mo ago
def scrape_data(url): try: app = FirecrawlApp(api_key=os.getenv('FIRECRAWL_API_KEY')) scraped_data = app.scrape_url(url, {'pageOptions': {'onlyMainContent': False}}) if'markdown' in scraped_data: return scraped_data['markdown'] else: raise KeyError("The key'markdown' does not exist in the scraped data.") except Exception as e: logger.error(f"Error scraping data: {e}").......this is the condition i have been set what changes i need to do?
thomas
thomas14mo ago
Firecrawl Docs
Crawl | Firecrawl
Firecrawl can recursively search through a urls subdomains, and gather the content
Thaslu
ThasluOP14mo ago
what are the best parametres for these type of websites?

Did you find this page helpful?