LLM Extract Does Not Do Whole Page?

@Caleb Trying To Extract Structured Data From A Website. But All The Data Is Not Being Scraped. Only The First Entries At The Top Of The Page Are Being Scraped. Any Suggestions?
9 Replies
Adobe.Flash
Adobe.Flash13mo ago
Hey, could you share your request url/schema so we can replicate? My guess is that it has to do with the page loading on scroll
babyboomboom.
babyboomboom.OP13mo ago
Thank you very much for helping! Just a caveat, I am not a coder or developer. So, there's a likelihood I am missing something in the request. Here you go:
Caleb
Caleb13mo ago
Where is your extraction schema? thats the most important parameter to pass because it tells the model exactly what format it should return the data in
babyboomboom.
babyboomboom.OP13mo ago
Caleb! Great to hear from you. In my ignorance, I put the schema in the prompt. The extraction produced the desire result as far as structuring the output correctly. However, It stopped after 10 extractions when there were over 500 more to do. Should I explicitly state the extraction schema?
Caleb
Caleb13mo ago
Yes, explicitly state the extraction schema!
babyboomboom.
babyboomboom.OP13mo ago
Caleb. Understood and thank you for your time. I declaared the schema to scrape data off a simpler website. Here is the updated code I used: class ExtractSchema(BaseModel): Address: str Location: str Price: int Beds: int Baths: float SqFt: int Px_SqFt: int Time_On_Redfin: str
babyboomboom.
babyboomboom.OP13mo ago
data = app.scrape_url( "https://www.redfin.com/zipcode/89134/filter/sort=lo-days", { #extract the listings "formats": ["extract"], "extract": { "schema": ExtractSchema.model_json_schema(), "prompt": "Extract all the listings from the redfin website" } ) print(data["extract"])
babyboomboom.
babyboomboom.OP13mo ago
It does a wonderful job of correctly extracting data into the defined schema. But, it only pulls up the first listing and there are over 30 on that page. Can you please teach me what changes I need to make to capture data from all the listings?

Did you find this page helpful?