websocket error during Apify run that is not from our code

In the middle of a Python Playwright run, we are getting this error:
ERROR Error in websocket connection
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 1301, in close_connection
await self.transfer_data_task
File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 974, in transfer_data
await asyncio.shield(self._put_message_waiter)
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/apify/event_manager.py", line 222, in _process_platform_messages
async for message in websocket:
File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 498, in __aiter__
yield await self.recv()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 568, in recv
await self.ensure_open()
File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 939, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1011 (internal error) keepalive ping timeout; no close frame received
ERROR Error in websocket connection
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 1301, in close_connection
await self.transfer_data_task
File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 974, in transfer_data
await asyncio.shield(self._put_message_waiter)
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/apify/event_manager.py", line 222, in _process_platform_messages
async for message in websocket:
File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 498, in __aiter__
yield await self.recv()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 568, in recv
await self.ensure_open()
File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 939, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1011 (internal error) keepalive ping timeout; no close frame received
We do not have any websocket code in our Actor. The traceback does not have any reference to our code.
7 Replies
generous-apricot
generous-apricotOP•15mo ago
Our Actor code is
async with Actor:
actor_input = await Actor.get_input() or {}
proxy_settings = actor_input.get('proxies',None)
proxy_configuration = await Actor.create_proxy_configuration(actor_proxy_input=proxy_settings)
proxy = await proxy_configuration.new_proxy_info('session0')
# Launch Playwright an open a new browser context
Actor.log.info('Launching Playwright...')
async with async_playwright() as playwright:
server = proxy['url'][proxy['url'].find('@')+1:]
print("Determined server is %s" % (server))
proxy_block = {
"server": server,
"username": proxy['username'],
"password": proxy['password']
}
browser = await playwright.chromium.launch(headless=Actor.config.headless, proxy=proxy_block)
context = await browser.new_context()
tasks = []

task = asyncio.create_task(worker('worker-0', context))
tasks.append(task)
# Wait until all worker tasks finish or one throws an exception
try:
await asyncio.gather(*tasks)
except Exception as e:
raise e
print("All done ")
async with Actor:
actor_input = await Actor.get_input() or {}
proxy_settings = actor_input.get('proxies',None)
proxy_configuration = await Actor.create_proxy_configuration(actor_proxy_input=proxy_settings)
proxy = await proxy_configuration.new_proxy_info('session0')
# Launch Playwright an open a new browser context
Actor.log.info('Launching Playwright...')
async with async_playwright() as playwright:
server = proxy['url'][proxy['url'].find('@')+1:]
print("Determined server is %s" % (server))
proxy_block = {
"server": server,
"username": proxy['username'],
"password": proxy['password']
}
browser = await playwright.chromium.launch(headless=Actor.config.headless, proxy=proxy_block)
context = await browser.new_context()
tasks = []

task = asyncio.create_task(worker('worker-0', context))
tasks.append(task)
# Wait until all worker tasks finish or one throws an exception
try:
await asyncio.gather(*tasks)
except Exception as e:
raise e
print("All done ")
The worker() task is our crawler that opens a page in playwright and parses a small piece out. We're not sure where, if anywhere, a websocket is running. Since the code is coming from apify/event_manager.py we think it's some Apify global code that's running, but we can't find it.
rare-sapphire
rare-sapphire•15mo ago
Our team will reply you soon! cc @Vlada Dusek
vicious-gold
vicious-gold•15mo ago
Hi, could you please provide a full reproducible code sample? (what is async_playwright and worker?) Is it happening both locally and on the platform?
absent-sapphire
absent-sapphire•15mo ago
I'll make my assumptions. async_playwright is a basic input point to asynchronous playwright, you can check it by documentation - https://playwright.dev/python/docs/api/class-playwright. Judging by the worker code, this is a coroutine that works with the received context and executes a script for web scraping. I am somewhat confused by this code section.
tasks = []

task = asyncio.create_task(worker('worker-0', context))
tasks.append(task)
tasks = []

task = asyncio.create_task(worker('worker-0', context))
tasks.append(task)
Is it true that there is always only one task in your tasks? Or did you change the code when copying it here? Also, I don't see the browser closing anywhere.
await browser.close()
await browser.close()
Since the error may occur if the browser was not closed correctly.
generous-apricot
generous-apricotOP•15mo ago
There is just one task per Actor. We ported this from another Actor we maintain that does multiple. But since this boots a browser, we keep it to one task.
MEE6
MEE6•15mo ago
@grackle just advanced to level 3! Thanks for your contributions! 🎉
generous-apricot
generous-apricotOP•15mo ago
from playwright.async_api import async_playwright is that. The worker code is simply:
context = await browser.new_context(proxy=[we give it a new proxy])
page = await context.new_page()
await page.goto(url)
content = await page.content()
# do stuff with content
context = await browser.new_context(proxy=[we give it a new proxy])
page = await context.new_page()
await page.goto(url)
content = await page.content()
# do stuff with content
I did a big run last night and the websocket error has not returned as far as I can see. I'm wondering if it was something spurious. Do you have any context you can give on where _process_platform_messages is supposed to run and why it would time out ?

Did you find this page helpful?