Recommendations for TCP_USER_TIMEOUT and keepalive settings
Hi there, I've noticed an increase in dead TCP connections to a service that is behind cloudflare recently. I was reading this blog post https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die that talks about TCP_USER_TIMEOUT and keep alives and I was wondering does Cloudflare have any recommended values for these settings? I've seen docs about settings for the origin server, but not the client talking to Cloudflare. Any advice would be appreciated, thanks!
The Cloudflare Blog
When TCP sockets refuse to die
We noticed something weird - the TCP sockets which we thought should have been closed - were lingering around. We realized we don't really understand when TCP sockets are supposed to time out!
1 Reply
Fair call! I guess I'm asking because I'm using the default values (which vary between kernel / runtime / library) and I've only recently (the last 3 or 4 days) started getting timeouts caused by a TCP_USER_TIMEOUT of 30s.
A lot of the defaults set are very conservative too. If normally P99 latency is between 500/800ms then waiting 30 seconds to realise the TCP connection is dead is quite painful.