CA
Crawlee & Apify5mo ago
other-emerald

how to pass data to routes.py

If i use multiple files, what is the best way to pass data (user input, which contains 'max_results' or something) to my routes.py? example snippet main.py
max_results = 5 # example

crawler = PlaywrightCrawler(
headless=False,
request_handler=router,
)
await crawler.run([start_url])
max_results = 5 # example

crawler = PlaywrightCrawler(
headless=False,
request_handler=router,
)
await crawler.run([start_url])
snippet routes.py
@router.default_handler
async def default_handler(context: PlaywrightCrawlingContext) -> None:
max_results = ???
@router.default_handler
async def default_handler(context: PlaywrightCrawlingContext) -> None:
max_results = ???
4 Replies
Hall
Hall5mo ago
Someone will reply to you shortly. In the meantime, this might help:
other-emerald
other-emeraldOP5mo ago
Is this good?
request = Request.from_url(
url=start_url,
user_data={
"max_results": max_results,
}
)

print(start_urls)
crawler = PlaywrightCrawler(
headless=False,
request_handler=router,
)
await crawler.run([request])
request = Request.from_url(
url=start_url,
user_data={
"max_results": max_results,
}
)

print(start_urls)
crawler = PlaywrightCrawler(
headless=False,
request_handler=router,
)
await crawler.run([request])
@router.default_handler
async def default_handler(context: PlaywrightCrawlingContext) -> None:
max_results = context.request.user_data.get('max_results')
print(f"Max results: {max_results}")
@router.default_handler
async def default_handler(context: PlaywrightCrawlingContext) -> None:
max_results = context.request.user_data.get('max_results')
print(f"Max results: {max_results}")
Mantisus
Mantisus5mo ago
Yes, I myself often use this exact approach, passing data through user_data.
unwilling-turquoise
unwilling-turquoise5mo ago
Pass max_results via a shared configuration module Create a config.py file to store global configuration variables that both main.py and routes.py can access. Example: config.py max_results = 5 # Default value main.py import config from playwright_crawler import PlaywrightCrawler from routes import router config.max_results = 5 # Set max_results dynamically crawler = PlaywrightCrawler( headless=False, request_handler=router, ) await crawler.run([start_url]) routes.py import config from playwright_crawler import PlaywrightCrawlingContext @router.default_handler async def default_handler(context: PlaywrightCrawlingContext) -> None: max_results = config.max_results print(f"Max results: {max_results}")

Did you find this page helpful?