US traffic is being served by London and France datacenter
around 13:00 UTC most of the US traffic is being pointed to Cloudflare France and London datacenters. This causes our Europe region to sustain "undesigned" traffic and that region goes down or slows down a lot.
This repeats daily till about 01:00 UTC and US traffic is again served by US datacenters.
As the traffic goes over seas the ping for end user increases.
The only solution to controll that was to shut down the EU Load Balancer pool, but that does not resolve increased latency.
This can be seen in HTTP Traffic chart, filtering out US and selecting to group by Datacenter.
The chart directly corelates with what we see in our internal traffic analysis.

62 Replies
known: https://discord.com/channels/595317990191398933/1408149529202786325/1408151056877097010
Upgrading plans can help but not guaranteed and only incrementally.. I have some testing on it and for this issue specifically Free -> Pro is maybe 20% less rerouting, Biz another 20% or so less, Argo on free or Ent is mostly gone but still slightly there. There's no guarantees though, they just need to expand capacity in the region, not many other details they've shared tho.
if you just want it served from the lowest available latency, it's doing that at present and it's surprising-but-true that if your user is on the US east coast, London is closer than California
if you specifically care about always being served in-region, that's what https://developers.cloudflare.com/data-localization/regional-services/ is for. if you just want low latency, this is "working as intended" but sometimes there isn't enough for everybody and there is a priority order
Cloudflare Docs
Regional Services
Regional Services gives you the ability to accommodate regional restrictions by choosing which subset of data centers decrypt and service HTTPS traffic.
and for east-coast users, you're competing with everybody who runs massive workloads in us-east-1, the world's most overloaded cloud region. capacity in that part of the world is a recurring challenge
(it happens to be a bit worse than usual in the past week or so, for reasons that have been most of my couple of weeks, but what you personally are observing is likely just that you used to fit into US east coast capacity and you've been pushed out)
if the actual problem that's hurting you is that this has moved too much of your origin pull traffic to origins in Europe, you might want to insert https://developers.cloudflare.com/load-balancing/load-balancers/ to force some of it back to the US, instead of relying on whatever is closest to the cloudflare location the user is served from
Cloudflare Docs
Load balancers
A load balancer distributes traffic among pools according to pool health and traffic steering policies. Each load balancer is identified by its DNS hostname (lb.example.com, dev.example.com, etc.) or IP address.
US east coast traffic is routed to the London and France, which is further than our load balancer pool located in Columbus.
Forcing Cf Europe edge servers to route US back to US seems ping costly.
It does not matter if Europe pool is on, they just route requests to Europe edge servers.
I'm having the same issue. I wouldn't normally mind if traffic went to London or France, but the performance for these data centers is very poor.
I'm using workers to get data from R2 (just simple uploads, usually less than 200 KB) and those take over 30 seconds to load, even if cached and it's not going to the origin (R2)
I understand that Cloudflare can't handle increased traffic to US data centers, so they are routing people to EU data centers for less important customers.
But they also can't handle increased traffic to EU data centers either.
yeah it's a multidimensional problem. I've been very busy this past couple weeks. I can't give any forward looking statements but there's plenty of attention on what's happening
I am cautiously optimistic for the situation improving. it's never going to be ideal, but the number of users served from further away should be lower than it is right now
Glad to hear that it's being looked at and fixed.
Is this issue why I'd be seeing increased 520x errors?
This only started happening on August 21st, no changes to my nginx config (or my code) on my end.
I can't seem to figure this out. It seems to be an issue only when Cloudflare starts pushing people to EU data centers, but the issue is intermittent, which leads me to believe it's just about how overloaded the data centers are with Cloudflare.
I've tried switching servers and I see the same issue, although with varying severity. But even that depends on how overloaded the Cloudflare data centers are, assuming that's the issue...
The time I'm seeing this is around 2:00 - 2:40 PM UTC



its still happening

IP is on the CDN77 network in their ashburn DC
Are you seeing any 520x errors on your end @Frerduro?
no


things moving around inside the US is "normal", especially at this time of year
don't expect this to be an overnight change, it's slower moving
I mean connecting to paris and melbourne aus is outside the US
I was playing around with this and built out https://delay.chaika.me/routing/ if it helps anyone to see the plan differences.
This is done by testing from a bunch of locations in NA (SEA, PDX, SJC, LAS, SLC, MCI, DFW, ATL, MIA, ORD, DTW, YYZ, EWR, IAD), using datacenter conns usually over IX or direct peering so should entirely be CF shuffling requests between DCs.
It's not worth upgrading to Pro to get away from it, Business is mostly uneffected. Argo & Ent is mostly not effected.
Cloudflare Routing Monitoring
See Cloudflare Routing, using Workers running on each plan returning static content.
idk its just weird. I fully accept getting redirected to other DCs in the US. But Australia is about as far as you can go.
The rerouting seems pretty across the board, some PoPs are effected less then others but they all shift, I don't imagine sending a request to another PoP near capacity would help
From what I've seen, most requests which are rerouted get sent to entirely different regions for processing. Like a while ago Oceanic region (Australia and New Zealand) had capacity issues and most requests got flung all the way to Europe. I'm not sure if it's because other DCs are also just close enough, or if it's just the logic is safe and wants to ensure the forwarding location has capacity, but it's just observed behavior
I have been seeing this behavior with stuff we host ourselves. We got app hosted in an ashburn DC with these upstreams and even if the request comes from that same server rack but different machine to the public domain of the app I have seen
Ashburn server #2
> Cloudflare AUS
> Ashburn server #1
> Cloudflare AUS
> Ashburn server #2
Same behavior has been seen with residential ISPs connecting to the same public url
but also same behavior for our static HTML CF pages website
¯\_(ツ)_/¯It's not your routing if that's what you're saying, it's CF internally flinging the request from one DC to another for capacity reasons
https://blog.cloudflare.com/meet-traffic-manager/
probably Plurimog
If a request goes into Philadelphia and Philadelphia is unable to take the request, Plurimog will forward to another data center that can take the request, like Ashburn, where the request is decrypted and processed. Because Plurimog operates at layer 4, it can send individual TCP or UDP requests to other places which allows it to be very fine-grained: it can send percentages of traffic to other data centers very easily, meaning that we only need to send away enough traffic to ensure that everyone can be served as fast as possible
so your telling me that 0 other cloudflare pops in the US has capacity for the past week+
hell id prefer EU over AUS
ive even seen singapore and new zealand
That goes back to https://discord.com/channels/595317990191398933/1409539854747963523/1410375823994912888
anyway not much you can do other then wait it out or upgrade
I have data going back over a year and it wasn't ever this wide until now besides small bumps, at least from my simple testing against Workers

How do you have data for all these plans btw? You must be spending a ton of money just to collect data right?
CF is very nice and gives Community Champs & MVPs Enterprise and all the other plan levels
All the monitoring endpoints are just separate small VPS's feeding back. The primary purpose of my monitoring stuff was more like monitoring Worker Script Deployment https://delay.chaika.me/job/worker, but I also log which CF location deals with the requests, so always just kinda had this data
full enterprise plan?
There's no such thing as "full enterprise plan", it's all piece meal/requested in bits, but can ask for most features and get them. It's all non-commercial personal/testing usage but the upside is if someone asks if API Shield can do x or y (or a feature which requires it), we can test it and see that it can, or find problems with them and escalate them, etc.
There's been decent amount of incidents that Champ monitoring data has helped raise or find, Workers Deployments used to be way more unstable for example.
looks consistent with what I've been glaring at. it's supposed to look more like it did before the start of august - some, but not as much
today might have been a bit better than earlier in the week, should be a bit more over the next couple days
noup, well at least in my case. yesterday was much better. we have 3 regions, us-east, us-west, eu. day before it was balanced in periods like 1/3 per region, no where near perfect, but much better. tonight it was in bursts, 1h eu, then EU almost taken out, then again EU takes most of the US traffic. Attaching screenshot just for the US traffic and balance over regions mentioned

time in chart GMT+3
I mainly use the services that I've got running through tunnels during the day time which is when I see the most impact. I think that it appearing better during the night is just a symptom of less usage for the service or something. It looks like it's been pretty consistent based of the data that Chaika has so kindly been providing.

The only plan that seems to have zero impact is Ent Spectrum HTTP. Someone needs to let me in on the secret for getting a trial for that. :SAD:
yeah free will be the last thing to stop spilling out of region
it's hard to predict when that will be, it always does it at least a little bit
im not sure if pro for 8-9 years is paid enough, but apparently we are still getting spilled
enterprise goes first, and that's a lot of traffic
One thing I don't get is why do I never see US west of US south? I see aus, singapora, paris, etc first
Its either US east or across an ocean for me nothing inbetween
That's a good question. I'm also not sure why they don't route things to Canada instead or the Caribbean. Both of those are still faster than the EU, or Australia or Japan datacenters...
you'd expect so, but network paths take surprisingly strange routes. I looked into this because it seemed weird to me, but they are actually closer by latency
not by very much, there's only a couple ms difference
I'm hoping for another chunk to move back later today. it's going to be an ongoing process though
at least for me it does look way better on free today, there's some amount of in region rerouting (like ORD/EWR to MIA), but staying in region at least
it'll affect different zones and plans at different times, because it still doesn't all fit. it does appear somewhat better today though
yeah didn't even think of canada
obv I don't know all the internal numbers but if they're shifting due to capacity limits in the US there's no way the Caribbean is going to have enough capacity, and I don't think CF is very big in Canada either, no DO hosts there or anything. Makes sense to shift to other big regions
Different regions have different peaks too, makes sense to kind of opposite of follow the sun with capacity shifting
so far seems better but not perfect

I am just glad this issue isn't affecting CF Magic Transit at all it seems
I got a question does pro plan get any kind of priority or is it treated like free plan?
yes, it's between the two
Seems better today on my end too, almost no timeouts
Tomorrow will be a good test since there was downtime on the 23rd (last Saturday)
From what I saw, when it was at its peak during a few hours earlier this week, it was like
50% free conns being rerouted
Pro 40%
Biz 20%
Ent/Argo: few %'s.
Noticably less for Pro but still painful
Host had to recently switch our ip subnet from DataPacket to magic transit temporarily again because of a 3+tbps ddos issue going on so glad to see magic isn't being re-routed across the world like http is
At least last weekend there was minimal amounts of rerouting during the weekend as well, only weekdays
things should be significantly better now. still keeping an eye on it though
mostly yes.

that's pretty close to expectations (although something else is still wrong here, but I don't think it's affecting you based on those numbers, but others might have different experiences)
What im curious about is how much extra capacity does paris have to be eating traffic so much?
consistant #2 through the weeks
peak time is at different times in different timezones
Seeing timeouts, users being served to EU data centers again today
Mostly at CPH
Time on the graph is UTC-4


Seeing timeouts again today


we're still poking at it. things are generally much better now but it still might take a while to nail it all back down
keep in mind that for free accounts, we don't try to make this go to zero. it should be lower on higher plan tiers, ending up on zero for enterprise customers in most of the world (South America and Africa will always do some of it, there are limits to what is achievable)
Fair enough -- I don't expect it to ever go to 0 since I'm on the Pro plan
I am noticing that the timeouts are happening less frequently, which is how it worked before.
Even prior to all of these issues I saw re-routing happening occasionally, but there weren't any timeout issues that I saw, or if there was it was so infrequent that it wasn't of concern
I don't mind as much if people get re-routed to the EU or wherever as long as it works