Cloudflare Developers•16mo ago

Only one tunnel gets checked by the load balancer monitor

I have a load balancer with two tunnels attached. For some reason, only one tunnel seems to get monitor checks. I think this might also apply to health check traffic, but I need to double check. Traffic is otherwise split roughly equally between the two tunnels, and when there is an outage in that cluster that affects both, the tunnel that doesn't seem to get monitor checks will still register as "healthy". Does anyone have any idea what might be going on here?

11 Replies

nomaxx117OP•16mo ago

confirmed that all the health checks also seem to go to one tunnel

Cyb3r-Jak3•16mo ago

Random shot in the dark. Are the tunnels connected to the same DCs? Tunnel traffic typically through the closest location (not officially though). Wondering if the health checks are coming from a DC where the tunnel is connected to

nomaxx117OP•16mo ago

They are both connected to the DFW-A PoP, though different colos I've turned off the health check's right now to reduce the noise a bit and focus on the knobs I have around the monitors

Cyb3r-Jak3•16mo ago

There might be some internals of how the tunnel is traffic is being routed but no clue

nomaxx117OP•16mo ago

that's what i'm wondering i've uninstalled and reinstalled that tunnel to no avail strange, removing the tunnel that gets all the monitor requests still leaves the other tunnel getting none umm, why are they both one connector? how did i get myself into this situation? wait a minute on worker 1 (the one getting all the traffic), cloudflared tunnel info returns the same id for both tunnels - the id of worker 1's tunnel on worker 2, the correct ids are concerned wat i am bamboozled

Cyb3r-Jak3•16mo ago

That’s uh funky. Like the connector or the tunnel IDs?

nomaxx117OP•16mo ago

from heavy-worker-1

➜  ~ cloudflared tunnel info heavy-worker-1                                                       
NAME:     heavy-worker-1
ID:       aff69054-<REST>
CREATED:  2023-05-19 22:41:36.32161 +0000 UTC

CONNECTOR ID                         CREATED              ARCHITECTURE VERSION  ORIGIN IP      EDGE                      
7b612dfc-<REST> 2024-01-14T00:34:27Z linux_arm64  2024.1.2 104.13.171.136 1xdfw01, 1xdfw05, 2xmci01 
➜  ~ cloudflared tunnel info heavy-worker-2                                                       
NAME:     heavy-worker-1
ID:       aff69054-<REST>
CREATED:  2023-05-19 22:41:36.32161 +0000 UTC

CONNECTOR ID                         CREATED              ARCHITECTURE VERSION  ORIGIN IP      EDGE                      
7b612dfc-<REST> 2024-01-14T00:34:27Z linux_arm64  2024.1.2 104.13.171.136 1xdfw01, 1xdfw05, 2xmci01

➜  ~ cloudflared tunnel info heavy-worker-1                                                       
NAME:     heavy-worker-1
ID:       aff69054-<REST>
CREATED:  2023-05-19 22:41:36.32161 +0000 UTC

CONNECTOR ID                         CREATED              ARCHITECTURE VERSION  ORIGIN IP      EDGE                      
7b612dfc-<REST> 2024-01-14T00:34:27Z linux_arm64  2024.1.2 104.13.171.136 1xdfw01, 1xdfw05, 2xmci01 
➜  ~ cloudflared tunnel info heavy-worker-2                                                       
NAME:     heavy-worker-1
ID:       aff69054-<REST>
CREATED:  2023-05-19 22:41:36.32161 +0000 UTC

CONNECTOR ID                         CREATED              ARCHITECTURE VERSION  ORIGIN IP      EDGE                      
7b612dfc-<REST> 2024-01-14T00:34:27Z linux_arm64  2024.1.2 104.13.171.136 1xdfw01, 1xdfw05, 2xmci01

from heavy-worker-2:

➜  ~ cloudflared tunnel info heavy-worker-1
NAME:     heavy-worker-1
ID:       aff69054-<REST>
CREATED:  2023-05-19 22:41:36.32161 +0000 UTC

CONNECTOR ID                         CREATED              ARCHITECTURE VERSION  ORIGIN IP      EDGE                      
7b612dfc-<REST> 2024-01-14T00:34:27Z linux_arm64  2024.1.2 104.13.171.136 1xdfw01, 1xdfw05, 2xmci01 
➜  ~ cloudflared tunnel info heavy-worker-2
NAME:     heavy-worker-2
ID:       b7561864-<REST>
CREATED:  2024-01-14 00:49:37.245692 +0000 UTC

CONNECTOR ID                         CREATED              ARCHITECTURE VERSION  ORIGIN IP      EDGE                      
d537845f-<REST> 2024-01-14T00:51:00Z linux_arm64  2024.1.2 104.13.171.136 1xdfw06, 1xdfw09, 2xmci01

➜  ~ cloudflared tunnel info heavy-worker-1
NAME:     heavy-worker-1
ID:       aff69054-<REST>
CREATED:  2023-05-19 22:41:36.32161 +0000 UTC

CONNECTOR ID                         CREATED              ARCHITECTURE VERSION  ORIGIN IP      EDGE                      
7b612dfc-<REST> 2024-01-14T00:34:27Z linux_arm64  2024.1.2 104.13.171.136 1xdfw01, 1xdfw05, 2xmci01 
➜  ~ cloudflared tunnel info heavy-worker-2
NAME:     heavy-worker-2
ID:       b7561864-<REST>
CREATED:  2024-01-14 00:49:37.245692 +0000 UTC

CONNECTOR ID                         CREATED              ARCHITECTURE VERSION  ORIGIN IP      EDGE                      
d537845f-<REST> 2024-01-14T00:51:00Z linux_arm64  2024.1.2 104.13.171.136 1xdfw06, 1xdfw09, 2xmci01

heavy-worker-1 gets all the monitor traffic how is this possible i continue to find novel ways of breaking computers lol @Cyb3r-Jok3 i legitimately have no idea how i did this lmao

Cyb3r-Jak3•16mo ago

lol heavy-worker-1 work seems cursed

nomaxx117OP•16mo ago

really is lol i'm just gonna delete the tunnel and make a new one

Cyb3r-Jak3•16mo ago

Remote managed tunnels for the win

nomaxx117OP•16mo ago

that did not fix the issue i am bamboozled neither did making the tunnels remote how would one do this? also, i too am curious about how i got myself into this mess 😂 i'll go check these Cloudflared tunnel metrics show even rps to each tunnel weird but this disagrees with what logs show when i tail journalctl i'll do more digging, wonder if something is borked on my end will be double checking my logging So, I figured this out. My nginx cache was broken on the node seeing elevated traffic, and my metrics were generated in a proxy which was layered after nginx. I first instrumented cloudflared and saw even RPS. Then I instrumented nginx and saw the same. I then realized that the node with higher traffic saw identical traffic before and after the nginx layer, despite the supposed presence of a cache, so maybe that was the broken node? Tailing the error logs there revealed that there was a classic permissions failure there causing the cache to be circumvented. As far as why only one node got health check alerts, it appears to be due to the nature of the outages I was looking at - the outages were with things like my Redis cluster, not the front-line workers themselves. The cache was preventing the health checks on the second worker from spotting availability issues. I remediated my issues by fixing the permissions issue and disabling caching on the health check endpoint I was using on both workers. Part of what threw me off here was that load was legitimately imbalanced - the lack of a cache meant that there were more RPS for one worker than another hitting everything behind nginx, so things like CPU and memory usage were higher.

Gaming

Programming

Only one tunnel gets checked by the load balancer monitor

Did you find this page helpful?