Firecrawl batch cannot crawl some urls

5 Replies
Duong Le
Duong LeOP2mo ago
Dear @Firecrawl Team i checked the source code of firecrawl python sdk it doesn't mention ignore_invalid_urls and also validate kwargs function "batch_scrape_urls": {"formats", "headers", "include_tags", "exclude_tags", "only_main_content", "wait_for", "timeout", "location", "mobile", "skip_tls_verification", "remove_base64_images", "block_ads", "proxy", "extract", "json_options", "actions", "agent", "webhook"},
No description
micah.stairs
micah.stairs2mo ago
Did you try it though? It was working on my side with the latest version of the SDK,
Duong Le
Duong LeOP2mo ago
can you share your code with me
micah.stairs
micah.stairs2mo ago
Here you go!
import requests
url = "https://api.firecrawl.dev/v1/batch/scrape"
payload = {
"urls": ['https://www.britannica.com/biography/Stephen-Colbert', 'https://www.biography.com/movies-tv/stephen-colbert', 'https://www.cbs.com/shows/the-late-show-with-stephen-colbert/', 'https://www.imdb.com/name/nm0170306/bio/', 'https://en.wikipedia.org/wiki/Stephen_Colbert', 'https://en.wikipedia.org/wiki/The_Late_Show_with_Stephen_Colbert', 'https://www.televisionacademy.com/bios/stephen-colbert', 'https://en.wikipedia.org/wiki/List_of_awards_and_nominations_received_by_Stephen_Colbert', 'https://www.imdb.com/name/nm0170306/awards/', 'https://www.youtube.com/channel/UCMtFAi84ehTSYSE9XoHefig'],
"ignoreInvalidURLs": True,
}
headers = {
"Authorization": "Bearer YOUR_API_KEY_HERE",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.json())
import requests
url = "https://api.firecrawl.dev/v1/batch/scrape"
payload = {
"urls": ['https://www.britannica.com/biography/Stephen-Colbert', 'https://www.biography.com/movies-tv/stephen-colbert', 'https://www.cbs.com/shows/the-late-show-with-stephen-colbert/', 'https://www.imdb.com/name/nm0170306/bio/', 'https://en.wikipedia.org/wiki/Stephen_Colbert', 'https://en.wikipedia.org/wiki/The_Late_Show_with_Stephen_Colbert', 'https://www.televisionacademy.com/bios/stephen-colbert', 'https://en.wikipedia.org/wiki/List_of_awards_and_nominations_received_by_Stephen_Colbert', 'https://www.imdb.com/name/nm0170306/awards/', 'https://www.youtube.com/channel/UCMtFAi84ehTSYSE9XoHefig'],
"ignoreInvalidURLs": True,
}
headers = {
"Authorization": "Bearer YOUR_API_KEY_HERE",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.json())

Did you find this page helpful?