Playwright increase timeout

While using playwright with proxies sometimes the page is taking more time to load, so how can I increase the load time.
Page.goto: Timeout 30000ms exceeded
Page.goto: Timeout 30000ms exceeded
19 Replies
Hall
Hall•6mo ago
Someone will reply to you shortly. In the meantime, this might help:
MEE6
MEE6•6mo ago
@Shine just advanced to level 2! Thanks for your contributions! 🎉
other-emerald
other-emerald•6mo ago
Hi Did you try to use this code? try: await page.goto("https://example.com", timeout=60000) # 60 seconds timeout except Exception as e: print(f"Error loading the page: {e}")
rival-black
rival-black•6mo ago
If you're referring to the PlaywrightCrawler in crawlee You can increase the default timeout by passing the appropriate parameter to the browser Update: Solution - https://discord.com/channels/801163717915574323/1314296091650428948/1314315014118834207
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.browsers import BrowserPool, PlaywrightBrowserPlugin


user_plugin = PlaywrightBrowserPlugin(browser_options={"timeout": 60000})

browser_pool = BrowserPool(plugins=[user_plugin])

crawler = PlaywrightCrawler(browser_pool=browser_pool)
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.browsers import BrowserPool, PlaywrightBrowserPlugin


user_plugin = PlaywrightBrowserPlugin(browser_options={"timeout": 60000})

browser_pool = BrowserPool(plugins=[user_plugin])

crawler = PlaywrightCrawler(browser_pool=browser_pool)
You can pass any parameters that Playwright supports. https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch
rival-black
rival-blackOP•6mo ago
I was referring to PlaywrightCrawler let me try this
[crawlee.playwright_crawler._playwright_crawler] ERROR Request failed and reached maximum retries
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/crawlee/basic_crawler/_context_pipeline.py", line 65, in __call__
result = await middleware_instance.__anext__()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/crawlee/playwright_crawler/_playwright_crawler.py", line 260, in _handle_blocked_request
selector for selector in RETRY_CSS_SELECTORS if (await context.page.query_selector(selector))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/playwright/async_api/_generated.py", line 8064, in query_selector
await self._impl_obj.query_selector(selector=selector, strict=strict)
File "/usr/local/lib/python3.12/site-packages/playwright/_impl/_page.py", line 414, in query_selector
return await self._main_frame.query_selector(selector, strict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 304, in query_selector
await self._channel.send("querySelector", locals_to_params(locals()))
File "/usr/local/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 59, in send
return await self._connection.wrap_api_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 520, in wrap_api_call
raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.Error: Page.query_selector: Execution context was destroyed, most likely because of a navigation
[crawlee._autoscaling.autoscaled_pool] INFO Waiting for remaining tasks to finish
[crawlee.playwright_crawler._playwright_crawler] INFO Error analysis: total_errors=3 unique_errors=1
[crawlee.playwright_crawler._playwright_crawler] ERROR Request failed and reached maximum retries
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/crawlee/basic_crawler/_context_pipeline.py", line 65, in __call__
result = await middleware_instance.__anext__()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/crawlee/playwright_crawler/_playwright_crawler.py", line 260, in _handle_blocked_request
selector for selector in RETRY_CSS_SELECTORS if (await context.page.query_selector(selector))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/playwright/async_api/_generated.py", line 8064, in query_selector
await self._impl_obj.query_selector(selector=selector, strict=strict)
File "/usr/local/lib/python3.12/site-packages/playwright/_impl/_page.py", line 414, in query_selector
return await self._main_frame.query_selector(selector, strict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 304, in query_selector
await self._channel.send("querySelector", locals_to_params(locals()))
File "/usr/local/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 59, in send
return await self._connection.wrap_api_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 520, in wrap_api_call
raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.Error: Page.query_selector: Execution context was destroyed, most likely because of a navigation
[crawlee._autoscaling.autoscaled_pool] INFO Waiting for remaining tasks to finish
[crawlee.playwright_crawler._playwright_crawler] INFO Error analysis: total_errors=3 unique_errors=1
I am getting this error
from apify import Actor, Request
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.proxy_configuration import ProxyConfiguration
from crawlee.browsers import BrowserPool, PlaywrightBrowserPlugin

async def main() -> None:
async with Actor:
# Retrieve the Actor input, and use default values if not provided.
actor_input = await Actor.get_input() or {}
start_urls = [url.get('url') for url in actor_input.get('start_urls', [{'url': 'https://apify.com'}])]
proxy_settings = actor_input.get('proxy')
proxy_configuration = ProxyConfiguration(proxy_urls=[
'http://xxx:xxx@xxx:xxxx',
])

# Exit if no start URLs are provided.
if not start_urls:
Actor.log.info('No start URLs specified in Actor input, exiting...')
await Actor.exit()

user_plugin = PlaywrightBrowserPlugin(browser_options={"timeout": 60000})
browser_pool = BrowserPool(plugins=[user_plugin])

# Create a crawler.
crawler = PlaywrightCrawler(
max_requests_per_crawl=50,
proxy_configuration=proxy_configuration,
browser_pool=browser_pool
)

# Define a request handler, which will be called for every request.
@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
Actor.log.info("H")
url = context.request.url
Actor.log.info(f'Scraping {url}...')


# Run the crawler with the starting requests.
await crawler.run(start_urls)
from apify import Actor, Request
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.proxy_configuration import ProxyConfiguration
from crawlee.browsers import BrowserPool, PlaywrightBrowserPlugin

async def main() -> None:
async with Actor:
# Retrieve the Actor input, and use default values if not provided.
actor_input = await Actor.get_input() or {}
start_urls = [url.get('url') for url in actor_input.get('start_urls', [{'url': 'https://apify.com'}])]
proxy_settings = actor_input.get('proxy')
proxy_configuration = ProxyConfiguration(proxy_urls=[
'http://xxx:xxx@xxx:xxxx',
])

# Exit if no start URLs are provided.
if not start_urls:
Actor.log.info('No start URLs specified in Actor input, exiting...')
await Actor.exit()

user_plugin = PlaywrightBrowserPlugin(browser_options={"timeout": 60000})
browser_pool = BrowserPool(plugins=[user_plugin])

# Create a crawler.
crawler = PlaywrightCrawler(
max_requests_per_crawl=50,
proxy_configuration=proxy_configuration,
browser_pool=browser_pool
)

# Define a request handler, which will be called for every request.
@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
Actor.log.info("H")
url = context.request.url
Actor.log.info(f'Scraping {url}...')


# Run the crawler with the starting requests.
await crawler.run(start_urls)
This is the code I am trying
rival-black
rival-black•6mo ago
The same code without proxy works correctly for me. Even when set to high slow_mo to simulate a slow connection. Is it possible that the problem is with the proxy?
rival-black
rival-blackOP•6mo ago
yes proxy is working checked locally
rival-black
rival-black•6mo ago
Hmm, I don't have any ideas yet. The error looks like an attempt to work with a page that no longer exists in the context of browser execution. I would try to make request_handler_timeout higher as its default value is 60 seconds maybe the problem occurs when there is interaction with the element and the handler closes by timeout.
rival-black
rival-blackOP•6mo ago
same error when using apify proxy I tried in local system with
user_plugin = PlaywrightBrowserPlugin(browser_options={"timeout": 600000, 'headless': False})
user_plugin = PlaywrightBrowserPlugin(browser_options={"timeout": 600000, 'headless': False})
and
request_handler_timeout=timedelta(minutes=100)
request_handler_timeout=timedelta(minutes=100)
but still I am getting this error
raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.TimeoutError: Page.goto: Timeout 30000ms exceeded.
Call log:
- navigating to "https://apify.com/", waiting until "load"
raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.TimeoutError: Page.goto: Timeout 30000ms exceeded.
Call log:
- navigating to "https://apify.com/", waiting until "load"
rival-black
rival-black•6mo ago
With Apify's auto proxy, it works on my side.
rival-black
rival-blackOP•6mo ago
the timeout is not changing
rival-black
rival-black•6mo ago
I apologize, my mistake. That timeout only affects the opening of the browser. But not the page open 😢
rival-black
rival-blackOP•6mo ago
yes
@crawler.pre_navigation_hook
async def log_navigation_url(context: PlaywrightPreNavigationContext) -> None:
context.log.info(f'Navigating to {context.request.url} ...')
context.page.set_default_navigation_timeout(60000)
@crawler.pre_navigation_hook
async def log_navigation_url(context: PlaywrightPreNavigationContext) -> None:
context.log.info(f'Navigating to {context.request.url} ...')
context.page.set_default_navigation_timeout(60000)
I think this should work
rival-black
rival-black•6mo ago
Yeah, I think that should help, too.
rival-black
rival-blackOP•6mo ago
now I am not receiving any timeout
rival-black
rival-blackOP•6mo ago
BrowserContext | Playwright Python
BrowserContexts provide a way to operate multiple independent browser sessions.
MEE6
MEE6•6mo ago
@Shine just advanced to level 3! Thanks for your contributions! 🎉
rival-black
rival-black•6mo ago
Crawlee doesn't have access to browser context right now, pre_navigation_hook is the only way available So I think that's the best and only way. https://discord.com/channels/801163717915574323/1314296091650428948/1314315014118834207
rival-black
rival-blackOP•6mo ago
thank you

Did you find this page helpful?