8 replies

websocket-upgrade fetch from worker to DO randomly delayed

We are using DOs for a registry that coordinates the running of multi-user web sessions. We have "synchronizer" nodes that are external to the registry, each maintaining long-lived websockets into the registry for its housekeeping tasks.

A synchronizer watches for any of its socket connections dropping, or responding too sluggishly. In such cases, it automatically re-connects by sending a

wss

wss

request to our ingress worker, whose

fetch

fetch

delegates to methods like this:

function synchToSession(request: Request, env: Env, colo: string) {
    if (request.headers.get("Upgrade") === "websocket") {
      const sessionId = request.url.searchParams.get('session');
      const runnerId = env.SESSION_RUNNER.idFromName(sessionId);
      const sessionRunner = env.SESSION_RUNNER.get(runnerId);

      console.log(`worker@${colo}: forwarding websocket`);

      return sessionRunner.fetch(request);
    }
}

function synchToSession(request: Request, env: Env, colo: string) {
    if (request.headers.get("Upgrade") === "websocket") {
      const sessionId = request.url.searchParams.get('session');
      const runnerId = env.SESSION_RUNNER.idFromName(sessionId);
      const sessionRunner = env.SESSION_RUNNER.get(runnerId);

      console.log(`worker@${colo}: forwarding websocket`);

      return sessionRunner.fetch(request);
    }
}

...where the "session runner" DO has a

fetch

fetch

that boils down to:

async fetch(request: Request): Promise<Response> {
    const { 0: clientSocket, 1: ourSocket } = new WebSocketPair();
    ourSocket.accept();
    // ...set up event handlers etc, then...
    return new Response(null, { status: 101, webSocket: clientSocket });

async fetch(request: Request): Promise<Response> {
    const { 0: clientSocket, 1: ourSocket } = new WebSocketPair();
    ourSocket.accept();
    // ...set up event handlers etc, then...
    return new Response(null, { status: 101, webSocket: clientSocket });

Although the reconnections usually take of the order of 50ms, every few hours we hit periods when several synchronizers all detect a sluggish response and try to re-connect, and those reconnections are held up for a second or more before all completing at the same time. The worst cases have a delay of over 10 seconds.

The logs show that almost the entire delay occurs between the worker's console message, and the subsequent GET log line for the DO.

For context:
* the delays only rarely coincide with eviction and reload of DOs; generally the DOs are already active (i.e., no cold start involved).
* there is no other significant traffic to our ingress or workers.

How could we at least figure out where the time is going?

Cloudflare Developers•2y ago•

8 replies

thelunz

websocket-upgrade fetch from worker to DO randomly delayed

wss

wss

request to our ingress worker, whose

fetch

fetch

delegates to methods like this:

function synchToSession(request: Request, env: Env, colo: string) {
    if (request.headers.get("Upgrade") === "websocket") {
      const sessionId = request.url.searchParams.get('session');
      const runnerId = env.SESSION_RUNNER.idFromName(sessionId);
      const sessionRunner = env.SESSION_RUNNER.get(runnerId);

      console.log(`worker@${colo}: forwarding websocket`);

      return sessionRunner.fetch(request);
    }
}

function synchToSession(request: Request, env: Env, colo: string) {
    if (request.headers.get("Upgrade") === "websocket") {
      const sessionId = request.url.searchParams.get('session');
      const runnerId = env.SESSION_RUNNER.idFromName(sessionId);
      const sessionRunner = env.SESSION_RUNNER.get(runnerId);

      console.log(`worker@${colo}: forwarding websocket`);

      return sessionRunner.fetch(request);
    }
}

...where the "session runner" DO has a

fetch

fetch

that boils down to:

async fetch(request: Request): Promise<Response> {
    const { 0: clientSocket, 1: ourSocket } = new WebSocketPair();
    ourSocket.accept();
    // ...set up event handlers etc, then...
    return new Response(null, { status: 101, webSocket: clientSocket });

async fetch(request: Request): Promise<Response> {
    const { 0: clientSocket, 1: ourSocket } = new WebSocketPair();
    ourSocket.accept();
    // ...set up event handlers etc, then...
    return new Response(null, { status: 101, webSocket: clientSocket });

websocket-upgrade fetch from worker to DO randomly delayed

websocket-upgrade fetch from worker to DO randomly delayed

Similar Threads

Similar Threads

Similar Threads