CA
Crawlee & Apify6mo ago
wise-white

Pass args to handler

Hey I have a crawler which scrapes a lot of different websites, each with multiple urls. Each website has an associated id, I need for the dataset. So I want to scrape the urls, get the data but then instantly send it to a database, so I don't have to keep it on the EC2 instance. Is there a way to pass extra variables to @router.default_handler for company in valid_company_urls: crawler = await create_crawler(config, company) # Run crawler for this company's URLs await crawler.run(company['url']) So when I do something like this. How could I pass additional arguments to run that are then passed to the handler. I have not found anything in the docs. Thanks for any hints!
3 Replies
Hall
Hall6mo ago
Someone will reply to you shortly. In the meantime, this might help:
other-emerald
other-emerald6mo ago
Not 100% if that is what you mean (or if it is the best solution), but you can pass arbitrary arguments to Request.from_url , which you can then read in the handler, e.g. what I do: for xxx in xxx: Request.from_url( url = url, label = xxx, user_data = { 'abc': abc, 'def': def } ) And then access it in the handler via context.request.user_data['abc']
wise-white
wise-whiteOP6mo ago
Nice! That’s it exactly ! Thanks!

Did you find this page helpful?