Crawlee & Apify•8mo ago

How to re visit a url that is already scraped?

Hi I'm making a simple app that gets updated information from a website. This is inside a fastapi app and It uses AsyncIOScheduler to run the script every day, The issue is since the crawl is already visited the main page, for the next call, It will not re visit the page. I've did a lot of research but couldn't find a solution, other scrapers has someting like force= parameter to force the scrape. How can we fource the UNPROCESSED to the request? Here is the code

class Scraper:
    async def run_scraper(self):
        proxy_urls = process_proxy_file('proxy_list.txt')
        proxy_configuration = ProxyConfiguration(proxy_urls=proxy_urls)
        crawler = PlaywrightCrawler(proxy_configuration=proxy_configuration, headless=False, browser_type='chromium')

        @crawler.router.default_handler
        async def request_handler(context: PlaywrightCrawlingContext) -> None:
            print('Handling request...')
            context.request.state(RequestState.UNPROCESSED)

            # Scrape logic here
            # Return scraped data if needed

        request = Request.from_url('https://crawlee.dev')
        await crawler.run([request])
        return "Example Scraped Data"

class Scraper:
    async def run_scraper(self):
        proxy_urls = process_proxy_file('proxy_list.txt')
        proxy_configuration = ProxyConfiguration(proxy_urls=proxy_urls)
        crawler = PlaywrightCrawler(proxy_configuration=proxy_configuration, headless=False, browser_type='chromium')

        @crawler.router.default_handler
        async def request_handler(context: PlaywrightCrawlingContext) -> None:
            print('Handling request...')
            context.request.state(RequestState.UNPROCESSED)

            # Scrape logic here
            # Return scraped data if needed

        request = Request.from_url('https://crawlee.dev')
        await crawler.run([request])
        return "Example Scraped Data"

2 Replies

Hall•8mo ago

View post on community site

This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.

Apify Community

mysterious-greenOP•8mo ago

Thanks @Tomáš Linhart It worked. Here is how I did it: request = Request.from_url('https://crawlee.dev', unique_key=str(uuid4()))

Gaming

Programming

How to re visit a url that is already scraped?

Did you find this page helpful?