We're frequently getting "We encountered
We're frequently getting "We encountered an internal error. Please try again. (10001)" when making GET requests from a Javascript worker to R2 via a binding. Even when retrying for up to 5 minutes on the same worker/binding instance and the same object, we still hit this error. The error rate is low considering the volume of requests we get, but still, our clients can't afford to wait 5+ minutes (really not sure how long it takes to recover on average) for the data. It's a chunked dataset and for most use-cases, one failing chunk means the whole dataset is essentially useless for that period of time. Anyone here that could help me out?
4 Replies
WEUR-based with smart placement and tiered caching (if that helps). Error rate is low, like 0.0011%, but the long recovery time is problematic.
I was thinking maybe it's some stateful thing in the worker/binding, and maybe offloading retries via a subrequest to a worker with its own r2 binding might use a new connection and have better luck?
I've had it with and without caching, recently switched on tiered caching hoping it might change things, but still about the same error rate.
I haven't tried without smart placement though, maybe there's something in there? I don't know, clutching at straws here.
Sorry to answer directly, I'm not doing any caching in the worker or elsewhere other than tiered caching (switched on recently).
Yeah good advice for me anyway, but I'm getting the error on cache misses though, and without caching, so I don't think that would address the specific error here where the file is "unavailable" due to 10001 in the origin (r2) so caching wouldn't make any difference
Lots of concurrent requests from clients. The worker itself is 1 execution per request/file. I don't have exact request rates but yesterday we were at 2M requests over 24 hours. But that fluctuates during the day. I need to check the timings but it could be that the error rates are clustered around times when we're also doing quite a lot of writes/uploads to our R2 buckets...
Oof ok, that explains it then. I didn't think our load was high enough to hit this kind of thing yet, also wasn't expecting 500s, but the docs confirm it then. Thanks for the help @Space!
Having the same issue on my side @kj2 :/ but the number of errors has increased significantly in the last few days without any change in the volume of requests on our end - is it the same for you?
Yeah definitely seeing increased error rate. I'm looking into bucket sharding but it's going to add so much complexity...
Nothing on the status page yet.
also reluctant to jump into sharding right now :/ this person says they tried it a couple months ago and it didn't make a difference: https://community.cloudflare.com/t/get-we-encountered-an-internal-error-please-try-again-10001/809918. let's see, as Space says, could be an incident...
thanks a lot for the detail @Space 🙂 I have very low upload traffic, so won't think too hard about sharding then. I have much higher read traffic - which hasn't previously been an issue, so hoping it is just an incident