Problem with Redis after migrations (ECONNRESET)

Hi! I have a serivce with Medusa.js (Node), Redis and Postgres which I've had since september. I've had no problems with this service before, but after I migrated to the new databases I have gotten a problem where the service sometimes stops working (not immediately, but after a while, seemingly at random times). It doesn't crash but the API just times out or gets 404. I've found the logs which I think are the culprit, but I still haven't found a solution yet. It might have something to do with Redis timing out or hitting memory limits perhaps? Just a guess from my part so far. Here are the logs:
Error: read ECONNRESET

at TCP.onStreamRead (node:internal/stream_base_commons:217:20) {

errno: -104,

code: 'ECONNRESET',

syscall: 'read'

}

AbortError: Ready check failed: Redis connection lost and command aborted. It might have been processed.

at RedisClient.flush_and_error (/app/node_modules/redis/index.js:298:23)

at RedisClient.connection_gone (/app/node_modules/redis/index.js:603:14)

at Socket.<anonymous> (/app/node_modules/redis/index.js:227:14)

at Object.onceWrapper (node:events:632:26)

at Socket.emit (node:events:517:28)

at TCP.<anonymous> (node:net:350:12) {

code: 'UNCERTAIN_STATE',

command: 'INFO'

}

[ioredis] Unhandled error event: Error: read ECONNRESET
Error: read ECONNRESET

at TCP.onStreamRead (node:internal/stream_base_commons:217:20) {

errno: -104,

code: 'ECONNRESET',

syscall: 'read'

}

AbortError: Ready check failed: Redis connection lost and command aborted. It might have been processed.

at RedisClient.flush_and_error (/app/node_modules/redis/index.js:298:23)

at RedisClient.connection_gone (/app/node_modules/redis/index.js:603:14)

at Socket.<anonymous> (/app/node_modules/redis/index.js:227:14)

at Object.onceWrapper (node:events:632:26)

at Socket.emit (node:events:517:28)

at TCP.<anonymous> (node:net:350:12) {

code: 'UNCERTAIN_STATE',

command: 'INFO'

}

[ioredis] Unhandled error event: Error: read ECONNRESET
32 Replies
Percy
Percy5mo ago
Project ID: 9b3bd973-ba5c-4ef3-9f43-32f499f7ba19
Rasmus Lian
Rasmus Lian5mo ago
9b3bd973-ba5c-4ef3-9f43-32f499f7ba19
codico
codico4mo ago
FYI I've started experiencing this recently as well, with a PG instance - did not used to have these issues before
Rasmus Lian
Rasmus Lian4mo ago
Hm, really strange. It really messes our product up at the moment
codico
codico4mo ago
I'm not experiencing it on both my services, so could be an app layer issue but didn't use to happen 🤷‍♂️
Brody
Brody4mo ago
often times this happens when you aren't closing connections and the idle timeout is reached
codico
codico4mo ago
On my end my websockets are maybe suspicious, i'll take a look at some websocket settings
Brody
Brody4mo ago
for postgres pooled clients specifically this is solved by setting the pool minimum to 0 so that all connections are released and marked as closed
Rasmus Lian
Rasmus Lian4mo ago
Where do I set that?
Brody
Brody4mo ago
your issue looks to be with redis, either way you would need to reference the documentation for your database client
Rasmus Lian
Rasmus Lian4mo ago
Ah true, is there a similar pool minimum setting for Redis?
Brody
Brody4mo ago
not sure, you would need to reference the documentation for your database client
Rasmus Lian
Rasmus Lian4mo ago
That is Medusa then you mean in my case?
Brody
Brody4mo ago
medusa is not a database client, the redis npm package is
Rasmus Lian
Rasmus Lian4mo ago
Okey, will check there
codico
codico4mo ago
Thank you for the help Brody! If the connections to the DB are idle that means there are no DB operations? I should be having constant traffic 🤔 maybe I'm having some other underlying issue. Or I am missunderstanding. Nonetheless trying to put min:0 and hoping for the best! 😄
Brody
Brody4mo ago
what tech stack are you using?
codico
codico4mo ago
nestjs with socketio and typeorm to connect to PG I'm suspicious of socketio as well
Brody
Brody4mo ago
does typeorm have a pool.min setting?
codico
codico4mo ago
They have a poolSize which is max, but they also accept extra and pass it onto the underlying driver. So that should accept min I believe
Brody
Brody4mo ago
whats the underlying driver in use?
codico
codico4mo ago
pg and as you say that, realizing that maybe it doesn't have a min setting and I need to use the idleTimeoutMillis or allowExitOnIdle
Brody
Brody4mo ago
allowExitOnIdle seems like what we want
codico
codico4mo ago
Thank you for the help, I'll try it out!
Brody
Brody4mo ago
let me know how that goes!
meng_socal
meng_socal4mo ago
the problem inside is the proxy with tcp protocol
Brody
Brody4mo ago
sorry but the issue here does not lie with railway, they have internal monitoring for these kinds of things and nothing has been reported. these errors are due to how the client is handling connections
meng_socal
meng_socal4mo ago
WHat do you know!
latrapo
latrapo3mo ago
@Rasmus Lian I have the same issue with medusa, only started since we migrated. Have you found a solution?
Rasmus Lian
Rasmus Lian3mo ago
@latrapo Yes I solved it by upgrading Medusa and Redis services I think (cache service and notification provider) and make sure the Redis config is correct. I think that was it, I am afk atm so cant give you more atm
latrapo
latrapo3mo ago
Thanks! Will try doing that in the mean time 🙏
Brody
Brody3mo ago
make sure you aren't keeping any idle connections around and are sure to close a connection when done with it