CA
wise-white
Pass args to handler
Hey
I have a crawler which scrapes a lot of different websites, each with multiple urls.
Each website has an associated id, I need for the dataset.
So I want to scrape the urls, get the data but then instantly send it to a database, so I don't have to keep it on the EC2 instance.
Is there a way to pass extra variables to @router.default_handler
for company in valid_company_urls:
crawler = await create_crawler(config, company)
# Run crawler for this company's URLs
await crawler.run(company['url'])
So when I do something like this. How could I pass additional arguments to run that are then passed to the handler.
I have not found anything in the docs.
Thanks for any hints!
3 Replies
Someone will reply to you shortly. In the meantime, this might help:
other-emerald•6mo ago
Not 100% if that is what you mean (or if it is the best solution), but you can pass arbitrary arguments to
Request.from_url
, which you can then read in the handler, e.g. what I do:
for xxx in xxx:
Request.from_url(
url = url,
label = xxx,
user_data = {
'abc': abc,
'def': def
}
)
And then access it in the handler via context.request.user_data['abc']
wise-whiteOP•6mo ago
Nice! That’s it exactly !
Thanks!