Crawlee & Apify•17mo ago

Loading files along with HTML-scraped content via LangChain's ApifyDatasetLoader

The ApifyDatasetLoader for LangChain loads the records, which include the text, metadata, and fileUrl fields. All of the examples show loading content via the text or metadata fields — but what about fileUrl? Assuming the run has records for PDF, XLSX, and/or other files, is there an example of how to load those files alongside the scraped HTML content?

2 Replies

Alexey Udovydchenko•17mo ago

Its outside of SDK functionality: https://llamahub.ai/l/apify-dataset check their git or post quiestion there I guess

flat-fuchsiaOP•17mo ago

Got it, thanks, will check via the integration repo.

Gaming

Programming

Loading files along with HTML-scraped content via LangChain's ApifyDatasetLoader

Did you find this page helpful?