β Help Needed: Downloading Linked PDF Files with Crawlee πΈπ₯
Hello everyone,
I need some help with Crawlee. I've been using CheerioCrawler to scrape pages and I've managed to extract links and store page titles and URLs into a dataset. Now I want to add functionality to download linked files, like PDFs, from the scraped pages. However, I'm unsure how to do this natively with Crawlee.
Here's my current code:
Could anyone guide me on how to modify this code to download linked files, specifically PDFs, from the scraped pages? Any help would be appreciated, thank you!
5 Replies
deep-jadeOPβ’2y ago
can anyone help?
Hello @Alex
There is a code I didn't tested, but you may get the idea out of it:
Basically it will store all the PDF to the
storages/key-value-store/default
when running locallydeep-jadeOPβ’2y ago
Hi Pepa, thx, very helpful! Do you have any hint on how use Firebase Storage instead of the local key value store? My goal is to analyze the PDFs with an LLM and store the results in a vector databse.
I believe there would be a npm package for firebase with proper documentation, I have no personal experience with it.
deep-jadeOPβ’2y ago
will look into that. thank you very much Pepa