R2 errors
Interesting 🤔 not sure if anyone else monitors a lot of data like me when it comes to R2 but if anyone does just curious if you also notice starting 12/13 the error rate as risen a bit from what it used to be

16 Replies
Interesting, what kind of operations do you do? Are there certain ops that fail more than others? Do you know what these errors are?
We have an SLO graph internally that isn’t going crazy so I’m curious what’s going on here
So I do every operation type get/put/delete with files of varying size (0MB, 1MB, 5MB, 25MB). I have a load balancer healthcheck that is set to all regions that calls an api which will do 2 operations, 1 to a US bucket, and one to an EU bucket. After the operation completes/errors I send latency/error data to AE for tracking purposes.
The most common error I get is
Client Disconnect (10054) happens pretty equal across all operations. But I assume the client is still connected just fine otherwise the data wouldnt be in AE since its only in AE if the error is handled by the request.
And a way less common error is We encountered an internal error. Please try again. (10001) happening mostly to the put operation but not that often.This kind of thing fascinates me, since I do worry about things like increased error rates if I ever do an S3->R2 switchover... please do report any more findings back to the channel 😄
Thanks for doing that monitoring and reporting it
Yeah I'm going to take a look at this (or at least get someone else to take a look 😛 ). Can you give me the account ID where you make these requests from?
The client disconnect is what I want to look into a little more.
The internal error you're seeing for PUTs might be due to concurrency, but with your account ID, hopefully I'll be able to see what the exact reason is
Yeah sure the account is
dc941e8156f4a1336ca08481cb6d4222.
@sdnts just curious if this ever got looked at? I noticed a user complaining about elevated 500 error rates: https://discord.com/channels/595317990191398933/940663374377783388/1069076313517858888
And just wanted to say I also see another spike in error rates starting at around 2023-01-27 17:00:00 UTCHey sorry yeah I just looked and I think I know what the problem is. I have a PR up but we'll likely push it out on Monday since it's the weekend and this seems to be a small fraction of your requests
I will let you know when we do though so you can check if it helped
Sounds good and yeah definitely a small % 🙂
@sdnts Just curious, did this end up getting pushed?
It did actually, let me double check real quick if the errors I saw on Sunday are down too
Okay yeah so the errors I was seeing earlier are down, I'll let Unsmart confirm if their error rate is down as well
So my error rate in the last 24 hours has dropped (image 1), but the overall error rate is at an even higher peak now than the jump that happened on 12/13. Jumping up again on 1/27 (image 2).
Pre 12/13 the average errors per 12 hours would be about 100. 12/13 -> 1/26 it was about 900 per 12 hours. 1/27 -> 1/30 its from 1500-3000 per 12 hours.


It looks like the error rate should be going down to like 600-800 every 12 hours from the release that happened today. But still pretty far from going back to the pre 12/13 average which was 100 every 12 hours.
I will say each 12 hour point represents about 300,000 operations that happen so the error rate is still extraordinarily low even with the recent jumps I am seeing
Over the last 6 hours these are the top errors by operation type and error message. Mostly client disconnects, followed by internal errors. (The top one about network connection lost can be ignored thats a DO error that isnt included in the R2 graphs)

It’s sorta curious to me that client disconnects are so common for you too. We see a lot of client disconnects on our end too but it hard to track them down because they really could just be a dropped connection.
In your case I suppose it means that we (R2) closed the connection right?
Yes that's correct. I only save data in AE if I actually get a response back during the request.
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
Yeah sorry about that, it's all actively being investigated