Loading files along with HTML-scraped content via LangChain's ApifyDatasetLoader

The ApifyDatasetLoader for LangChain loads the records, which include the text, metadata, and fileUrl fields. All of the examples show loading content via the text or metadata fields — but what about fileUrl? Assuming the run has records for PDF, XLSX, and/or other files, is there an example of how to load those files alongside the scraped HTML content?
2 Replies
Alexey Udovydchenko
Its outside of SDK functionality: https://llamahub.ai/l/apify-dataset check their git or post quiestion there I guess
flat-fuchsia
flat-fuchsiaOP17mo ago
Got it, thanks, will check via the integration repo.

Did you find this page helpful?