How to parse out results in Langchain ApifyDatasetLoader
I'm using the Google Search Results Scraper which provides a single JSON with Paid, Organic and a few other keys. I'd like to parse out the Organic title and urls into a Langchain agent, but it's clear how to iterate over them. Any suggestions?
loader = apify.call_actor(
actor_id="apify/google-search-scraper",
# Prepare the Actor input run_input={"queries": query, "maxPagesPerQuery": 1, "resultsPerPage": 100, "customDataFunction": """async ({ input, $, request, response, html }) => {return {pageTitle: $('title').text(),};}""",},
dataset_mapping_function=lambda item: Document( page_content=item["url"] or "", metadata={"source": item["url"]} ), )
# Prepare the Actor input run_input={"queries": query, "maxPagesPerQuery": 1, "resultsPerPage": 100, "customDataFunction": """async ({ input, $, request, response, html }) => {return {pageTitle: $('title').text(),};}""",},
dataset_mapping_function=lambda item: Document( page_content=item["url"] or "", metadata={"source": item["url"]} ), )
1 Reply
correct-apricot•2y ago
Well, the problem is one dataset item contains many organic results so you will need to create more documents. I think it would need to be done separately