CD
Cloudflare DevelopersDanTheGoodman
Hey I think someone is attacking our DOs
Hey I think someone is attacking our DOs:
D
DanTheGoodman•244d ago
anyone have ideas on how to verify this?
we haven't deployed any code for months on DOs
or things that connect to DOs IIRC, certainly not in the last ~40 min
I can't even tell what DO it's from
D
DanTheGoodman•244d ago
the log isnt' particularly helpful either...
D
DanTheGoodman•244d ago
ther eisn't even an exception in the exception log
D
DanTheGoodman•244d ago
oh yeah that's weird... had to be an attack
D
DanTheGoodman•244d ago
happened again in a burst, does not seem to be a isngle object so I might guess it's an infra issue on CF's side?
D
DanTheGoodman•244d ago
added the ID of the object, not all the same, very high cardinality
D
DanTheGoodman•244d ago
we are getting hundreds of thousands of these
tens per second
D
DanTheGoodman•244d ago
M
milan•244d ago
Can you DM me your account ID, I can take a look to see if something is weird
D
DanTheGoodman•244d ago
yep
dm'd, discord might have suppressed the DM notification @milan_cf
D
DanTheGoodman•244d ago
had another burst a bit ago too
M
milan•244d ago
I got the dm, looking
This started about an hour ago?
18:30 utc or so?
Ah no I see, from 17:50 onwards, definitely see an increase in # invocations
D
DanTheGoodman•244d ago
Yeah just before then
And there’s no reason any individual durable object should be getting even within 20 or 30% of that connection limit
Or rather 20% of that limit
Largest we know of would be 7k peak
UU
Unknown User•244d ago
D
DanTheGoodman•244d ago
Alerts
Cloudflare log push to a Grafana stack
D
DanTheGoodman•244d ago
another one
M
milan•244d ago
Yeah, I see a linear increase in the number of websocket connections to some of your DOs
D
DanTheGoodman•244d ago
how high is the cardinality? Seemed pretty high from the logs when I added the IDs to the urls in the workers
(query param)
I also checked twitch and none of the 4 people above that viewer count are using our tools
M
milan•244d ago
Monitoring of ws hibernation isn't really good enough to know how many DO instances were hitting that limit, I can see there were about 900 instances, most of which received very few messages (under 10), between 17:50utc and 20:00utc (so the last 2 hours)
Bit over 700 of them received 10 or fewer messages
D
DanTheGoodman•244d ago
that's expected
M
milan•244d ago
How many connections are you generally expecting to get to a single DO instance? How are you connecting to the DO?
D
DanTheGoodman•244d ago
@milan_cf typically <100, but some can be in the thousands. We are connecting through browser websockets
D
DanTheGoodman•244d ago
ah more
D
DanTheGoodman•244d ago
@milan_cf any luck? It's pretty constant right now
M
milan•244d ago
We don't think the issue is with our infrastructure, there is a significant increase in invocations to your durable object namespace starting at 17:50 UTC, and its been sustained for a while. It's likely that someone is opening a lot of websocket connections to your DOs and forcing you to hit the connection limit.
D
DanTheGoodman•244d ago
Gotcha, so perhaps an attack then because we pushed no code that modifies how we connect to clients. In my checking of the pretty unhelpful exception logs I do see they are quite spread out among US and EU, but a large number coming from eastern EU. Is there any way to check that? Logpush does not give us that info and this exception happens before our code runs it seems
Unless maybe we can get that info in the worker before connecting to the DO?
M
milan•244d ago
this exception happens before our code runs it seemsYou mean logpush or the DO code? This exception should be from
acceptWebSocket()
throwing in the DOD
DanTheGoodman•244d ago
see this
there's no exception in the exceptions array
maybe that's not the right exception, but then what is that exception lol
M
milan•244d ago
Where did you get the 32k connection limit exception then?
D
DanTheGoodman•244d ago
logpush
that message from the screenshot is literally all logpush was sending, so I looked into the function call logs for more info
D
DanTheGoodman•244d ago
D
DanTheGoodman•244d ago
it also comes in waves completely uncharacteristic of any behavior from normal users of our app
D
DanTheGoodman•244d ago
and it seems a bit too constant to be our users
M
milan•244d ago
not sure why there's no exception there... I can ask around tomorrow (everyone is currently out for the day). I still think this is some sort of attack, but we definitely need to improve out hibernatable ws monitoring. It's probably worth wrapping acceptWebSocket in a try catch, or tracking how many ws you have connected and refusing to allow more to connect if you're near the limit (to avoid errors).
D
DanTheGoodman•244d ago
@milan_cf that's what I just pinged my team we are going to do (is try catch that and log our own error)
@.hades32 fyi (my team)
just added some try catch and additinal logging
D
DanTheGoodman•244d ago
ugh perfect timing to stop...
D
DanTheGoodman•244d ago
nothing in the logs still... trying to return a valid response to see if the exceptions go away
D
DanTheGoodman•244d ago
does not seem to be that @milan_cf , as these errors are still blank
D
DanTheGoodman•244d ago
idk what these exceptions even are...
this doesn't seem to be firing
I think those exceptions are unrelated, we aren't getting the conneciton limit log right now, so that seems to be a second issue we've found @milan_cf
M
milan•243d ago
probably stopped because of reload of all your DOs?
D
DanTheGoodman•243d ago
No it's still happening
been seeing it all night
M
milan•243d ago
are you throwing an exception in your code somewhere?
D
DanTheGoodman•243d ago
I added that code snippet above but it's never being reached, and there are no places that we are throwing an exception ourselves
D
DanTheGoodman•243d ago
ok for some reason as of a few hours ago that error started throwing
D
DanTheGoodman•243d ago
idk why that didn't show last night though
M
milan•243d ago
It looks like almost all your DOs are returning only 201s, a small percentage returning 400s
D
DanTheGoodman•243d ago
but it doesn't have any request headers?
I'm fixing the log to get the headers, and the query
M
milan•243d ago
Yeah so I went back a couple days and that namespace has only been responding with 201s and 400s, mostly 201s though
also be back in 10 I'm getting coffee
D
DanTheGoodman•243d ago
no worries, it stopped a few min before I pushed out the update
I'll sanity check our connection code, but we have nobody above 2k live viewers right now so nobody should be even remotely near that limit
I wonder if it's actually a side-effect of an attack on twitch, because it is through our twitch extension which gets loaded whe the twitch page loads
I can see these error logs have our JWT from twitch
I think we might have figured it out, it seems that when we navigate we open a new socket but do not close new ones... for some reason the browser is keeping them around for 1-2 minutes
now the rate of logs could be logpush throttling how fast they are sent, it looked like about 100/s which is the same limit that exists in the CF dashboard for viewing function logs
M
milan•243d ago
Mind expanding on this? I'm not familiar with what the DOs are doing or how the client works and I'm curious
D
DanTheGoodman•243d ago
@milan_cf Sure, basically we are using them as coordination, we use HTMX so a navigation is replacing the component that connects via websockets. However for some reason that's not disonnecting, we deployed what we think is a temporary fix
Basically I think every time our users did something they opened another socket
M
milan•243d ago
Did it fix the issue?
D
DanTheGoodman•242d ago
@milan_cf The connection limit one yeah, but not entirely, I think that was one issue but I think there still is an attack. We added code to verify that the Twitch token was passed and is valid, and this error happens when no token is passed in
D
DanTheGoodman•242d ago
Now we are rejecting it before accepting the socket, so that removes the connection limit issue, but it still seems like there are similar patterns. And our users are passing tokens (this is the last 12 hours)
perhaps these are viewers not logged in that are viewing though lol, but yeah connection issue solved. But the pattern is just so constant, it doesn't feel like our users
No, twitch gives a token regardless, this doens't seem to be our users
M
milan•240d ago
@danthegoodman we found a regression in the hibernation code regarding dispatching the close handler (+ dropping the websocket) upon client disconnects and we're investigating further. Not certain it's affecting you but I suspect it probably is, will keep you updated as I find more
D
DanTheGoodman•240d ago
Appreciate the update!
I wonder if the browser was not getting a close and thus kept reconnecting, as we never had this issue before hibernation
M
milan•240d ago
GitHub
🐛 Bug Report — Runtime APIs: Hibernating WebSockets remain open whe...
Problem We started observing the following strange behaviour some time yesterday. when using Durable Objects Hibernating WebSockets (calling state.acceptWebSocket(socket), when connecting via web b...
M
milan•239d ago
I think the release went out, have the issues been resolved?
D
DanTheGoodman•239d ago
@milan_cf not sure, we added our own code to prevent this in the meantime that has worked for us so far
M
milan•239d ago
Oh I thought you were still seeing problems regardless of that fix, my bad
D
DanTheGoodman•239d ago
no worries lol, we had to do 2 fixes
were other users hitting this?
or are we the largest websocket hibernation user 🤔
M
milan•239d ago
Yeah, it seems like it hit a couple other folks as well. Unfortunately our test in CI didn't verify if the disconnect handler ran, it only confirmed that when it ran everything worked as expected. That coupled with lack of hibernatable ws monitoring made this tricky to confirm w/o reports from users
We fixed the test case + are working on making this class of bugs discoverable at compile time. Will also need to think about some monitoring and metrics for hibernatable websockets
D
DanTheGoodman•239d ago
awesome
M
milan•239d ago
Not the largest but definitely up there 🙂
D
DanTheGoodman•239d ago
😎 ill take it
M
milan•239d ago
Sorry for the trouble, and thanks for the detailed report. We haven't had a larger scale issue w/ hibernation so this will help us with our tooling going forward
D
DanTheGoodman•239d ago
glad it's all sorted!
Welcome to the official Cloudflare Developers server. Here you can ask for help and stay updated with the latest news
47KMembers
View on DiscordWant results from more Discord servers?
More PostsFix 1014 CNAME Cross-User Banned with Cloudflare for SaaSHello Cloudflare Community!
We need your help and hints on solving our case. The initial conditionscloudflare repositoryI want to check when a ddos attack is sent to my website, and when it is sent, block all traffic carWorker not always finding .svg/.webpI'm running into an issue where my cloudflare worker in front of R2 is failing to return webp/svg buHow would I do require('googleapis') in a worker?My question is pretty straightforward:
How would I do
```const { google } = require('googleapis')`DNSHello Guys, I've lost access to a Cloudflare Account and I don't remember de email address. I need aCan Cloudflare for SaaS help me?I would like for my clients to have a custom domain that will host part of my saas offering
i.e. my Domain reseller options?Hello, I'm interested in reselling domains through cloudflare. The whmcs module seems to have been dis d1 down right now getting a failed tois d1 down right now? getting a "failed to reach database. please try again later" statusStuck on an infinite human verification loopCan anyone help? Seems to be happening on other websites aswellRendering API releaseDoes anybody know when the rendering API is set to be realised? Has anybody used this service to preThe creation of new pages projects has a 500 error.```json
{
"result": null,
"success": false,
"errors": [
{
"code": 8000000,
"meBug Report: like to Cloudflare Pages app on GitHub is broken for organizationsI'm trying to deploy a new repo on Pages, and I need to give this repo access to the Cloudflare PageConfiguring Exclude routes for NitroServer on NuxtJSI am currently trying to add an "exclude":["/api/*] to my auto generated _routes.json but I cant quiWhy does my CloudflareWARP adapter shows "No internet access" even though everything works?It has been happening ever since I've been using cloudflare (since a year ago), I thought it was norWhere to write Cloudflare workers functions in an Astro project?I'm working on an e-commerce project and am trying to integrate Stripe. Where would you place a servIP seems to be blocked?Hello everyone, i hope i can get some advice here. It appears that cloudflare has banned my ip as i Visibility and Diagnosis of Zero Trust Known Network/Profile selectionIs there a way to view any system logs when the Warp client probes for known networks and selects a Email Routing : Address not found errorSo I'm trying to route an email address using the email routing feature on a Cloudflare-managed domaCloudflare Pages Jekyll Deployment failing on production after latest updateLooks like cloudflare updated it bundler version for jekyll pages deployment, all our build our now Can I use Cloudflare Tunnel to connect to MySQL externally without root?On my system, there is no root.
And access to MySQL is restricted to localhost only.
I want to allo