Firecrawl seems to ignore URL hash fragments for pagination.
- Site: https://europa.provincia.bz.it/it/bandi-e-avvisi (pagination via
#start=N
)
- Expectation (Playwright): total=15, page 0 -> 10 items, page 1 -> 5 items (selector div.result.lv_faq
).
- Firecrawl scrape_url results: page 1 returns same content as page 0 or misses p.result_status
entirely.
- Tried: formats=["html"], formats=["rawHtml"], wait_for=12000–20000, max_age=0 (fresh fetch), correct base URL from container to host.
Ask: Does scrape_url honor initial URL hash fragments and trigger the page’s JS that reads location.hash
on load? Any flag/workaround for hash-based pagination?6 Replies
This is the code I use and that works (Playwright handles the hash-based pagination correctly):
Goal: I want to switch to using Firecrawl to fetch the HTML (formats=["html"], wait_for, max_age=0) and parse it myself, so we can migrate away from direct Playwright usage in our container.
Good question! We are about to roll out this feature actually: https://github.com/firecrawl/firecrawl/pull/2031.
Great! Thanks for the update!
Hey! So we now support hash-based routes starting with "#/", but that doesn't actually cover your case (
#start=N
) after all. I've filed a feature request for this and I will keep you posted, but I can't promise any timelines.Wait, cause actually it does?
I've switched to using scrape with html and it does work as intended
Oh awesome! Thanks for letting me know