Hey ,why do i get web scrapping of first url , since i have another url .
I am implemented Playwright crawler to parse the url , I made a single request to crawler with first url, since the request has been processing , meanwhile , i passed anotther url in craler and hit the request, While processing, through crawler, it is processing content from first url , instead of second url both times. Can be please help?
async def run_crawler(url, domain_name, save_path=None):
print("doc url inside crawler file====================================>", url)
crawler = PlaywrightCrawler(
max_requests_per_crawl=10,
browser_type='firefox',
)
@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
context.log.info(f'Processing {url} ...')
links = await context.page.evaluate(f'''() => {{
return Array.from(document.querySelectorAll('a[href*="{domain_name}"]'))
.map(a => a.href);
}}''')
await context.enqueue_links(urls=links)
elements = await context.page.evaluate(PW_SCRAPING_CODE)
data = {
'url': url,
'title': await context.page.title(),
'content': elements
}
print("datat =================>", data)
await context.push_data(data)
await crawler.run([url])
i am calling the craler using
3 Replies
Someone will reply to you shortly. In the meantime, this might help:
stormy-gold•2w ago
Hi, could you please try to rephrase your question? I don't understand what the problem is.
If you create a new crawler for each URL, each with a different request queue/list, they won't share requests, so they might process the same URL.
@Matous just advanced to level 1! Thanks for your contributions! 🎉