Novu•2y ago

Error: Missing lock for job

I am seeing the following error message repeated in the console by the worker: 2023-09-04 15:33:31 Error: Missing lock for job repeat:6ebd35806c12b88fd2ba60acc9ec4828:1693859611000. failed 2023-09-04 15:33:31 at Scripts.finishedErrors (/usr/src/app/node_modules/.pnpm/[email protected]/node_modules/bullmq/src/classes/scripts.ts:355:16) I haven't been able to figure out what's causing it, but I think it's somewhat related to a workflow step failing due to the subscriber not having a channel configured. 1. Can that error be fixed? 2. Is there a way to skip a step if the user doesn't have the channel configured? IMO this shouldn't be an error especially for push notifications since the user could choose not to accept them.

22 Replies

Pawan Jain•2y ago

Hi @jarredwitt - Can you please share steps to reproduce this error? - What your workflow looks like? no. of steps? - Are you self hosting Novu? Which version?

jarredwittOP•2y ago

Hi @Pawan Jain attached is our workflow. I can't really share steps to reproduce it because the error just repeats in the console upon startup of the worker container. I am self hosting and have been since v12. We are on the latest version and have done all of the migration scripts without error. This first started occurring when we upgraded to v14.

jarredwittOP•2y ago

Seems like it has something to do with BullMQ not getting a lock, but don't have a deep enough understanding of that process to confirm

jarredwittOP•2y ago

Attached is the current console from the worker container

jarredwittOP•2y ago

@Pawan Jain any idea on what could be causing this lock issue?

--•2y ago

I am assuming that you haven't configured the environment variables of the Redis Cache service:

 REDIS_CACHE_SERVICE_HOST?: string;
 REDIS_CACHE_SERVICE_PORT?: string;

 REDIS_CACHE_SERVICE_HOST?: string;
 REDIS_CACHE_SERVICE_PORT?: string;

The Redis Cache Service handles the connection for our Cache Service (performance feature) and the Distributed Lock Service (functionality to ensure handling properly concurrency for the notification scheduling features like Digest and Delay). The error belongs to the second one as you might not have configured the Redis Cache Service. It is optional though. One option is that you enable it setting the same values of your Redis instance used in:

 REDIS_HOST: string;
 REDIS_PORT: number;

 REDIS_HOST: string;
 REDIS_PORT: number;

Just be aware that with a high volume of processing you might experience that your Redis instance can't cope with all the information stored in the database. We hit that limit time ago and we moved our Redis instances to dedicated Redis Cluster compatible solutions in AWS.

jarredwittOP•2y ago

Hi @pablo.fernandez.otero , thanks for the quick reply. Those are both configured in my worker container, but they are set to the same redis host and port as the REDIS_PORT and REDIS_HOST variables. Does the cache service need to be pointed to a difference redis instance?

--•2y ago

Anyway that the failed job lock mentions repeat: is something that raises my eyebrow. Worth for us to investigate as it is very weird. No, it can work with the same single instance you have.

jarredwittOP•2y ago

Ok, so those variables are set correctly then.

--•2y ago

Since which version did you experience this?

jarredwittOP•2y ago

Since the upgrade to v14 Could it have something to do with a workflow failing because a channel does not exist? I have some failures in my workflows due to users not having the push notification channel configured. If you scroll up a few messages you can see the layout of the workflow. Since push notifications are opt-in and the user could simply not want to receive them is there a way to skip that part of the workflow if the user doesn't have the channel configured? As opposed to doing a pre-check for the user and creating two separate workflows that accomplish the same task

--•2y ago

I think is worth you to open an issue in GitHub. I think the lock error is misleading in a way and the problem could come elsewhere. Wouldn't make sense for us to execute part of the workflow if the channel is not configured by user preferences (opt-out). I'd ask you to provide the current version you are using where this happens, the example workflow and the configuration of the different steps and clarify as you did right now how you have your infra configured reminding us that you have set Redis Cache Services with the values of REDIS_HOST and REDIS_PORT. 🙏 Thus we can investigate what issue might be on our side if we missed something or if it something expected and we might need to handle the logging differently to not pollute your logs.

jarredwittOP•2y ago

Ok I'll file an issue on GitHub. Also just FYI here what I see in the web admin. When you say "it wouldn't make sense for us to execute a part of the workflow if the channel is not configured" does that mean it shouldn't be trying to execute and this is a possible bug OR that type of logic should be included in a future release?

--•2y ago

Sorry for not being clear. In the screen you are sharing we send an error report to the user that the subscriber that should receive this notification has not configured properly the Push channel. Now, there could be 2 options: that for any reason you haven't configured the channel provider for Push properly or that, as you said, the subscriber has opted out from receiving push notifications. Being the second case wouldn't make sense us to run the jobs of the notification related to the push notification as we are not meant to send them, therefore not incurring in any job running side effect (like locking the job for scheduled notifications features and many more). So that's something we should also review if we are throwing any unexpected error or we are not managing the flow properly or if for internal reasons it needs to behave like this. If it is a bug, we will need to fix it in a future release. If we understand it is not and it is our chosen behaviour we might think in ways to not pollute that much the logs, though that could be a double edge sword in certain cases. Hope this makes more sense.

jarredwittOP•2y ago

Ok thanks for the info. The push notification service is configured correctly because most of users have it configured and they successfully receive pushes. We have a few that have opted out which is the case for the subscriber in the shared image.

Novu_Bot•2y ago

@jarredwitt, you just advanced to level 3!

--•2y ago

This might be the most important part, so please specify it in the issue so any of us who could pick it up knows exactly how to test it and what to look for. Really appreciate the clarity on explanation what you are doing. It will be very helpful for us. Thank you.

jarredwittOP•2y ago

Very welcome, I'll get an issue filed.

dmgarland•2y ago

I also see this in our logs (self hosted). Was there a resolution? Was there an issue filed?

--•2y ago

I haven't been able to find it in GitHub issues, so I am afraid @jarredwitt didn't have time for that.

Vinayak_H•2y ago

Any update on this issue, Am also facing the same (Self Hosted, v16)?

Pawan Jain•2y ago

@Vinayak_H Could you please create an issue on github for this? We are closing this post due to inactivity. Feel free to create a new post if you have any questions or you are still facing issue

Gaming

Programming

Error: Missing lock for job

Did you find this page helpful?