N
Novu8mo ago
dmulligan

MongoDB : Jobs table - High number of records returned

I noticed that we have 750k records in the jobs table in MongoDB, the vast majority are marked with state of 'complete'. We have noticed that we have a high load on our AWS DocumentDB and were wondering if we could safely delete the completed job records? On the side, why are we seeing around 10k records getting read per minute? This happens non-stop over night when the system is not seeing much use.
8 Replies
dmulligan
dmulligan8mo ago
Just answering this for anyone else that has issues with AWS DocumentDB. I was seeing slow performance and in this question was interested to see if the number of records in the database was a factor. While I still need to investigate if records will get cleaned up on their own, I have a feeling they may not as expireAt is not set for jobs or messages, the issue here was missing indexes. There appears to be a compatibly issue when it comes to creating indexes in DocumentDB. When I identified this, and created the missing indexes on the job and message table, I noticed a huge speed increase.
Zac Clifton
Zac Clifton8mo ago
What you found out is what I was going to recommend, I would also recommend setting up the exprieAt index on
executionDetails
executionDetails
and
Notifications
Notifications
collections. If you figure out why the indexs are not being created write it here and I would be happy to put it in the documentation.
todd
todd8mo ago
If it isn't too lazy on my behalf, could you link me to the code that does this? I am maintaining a "self-hosted" MongoDB Atlas (rather than DocumentDB) and trying to get on top of the migrations. Thanks in advance. thanks @dmulligan for the heads up
Zac Clifton
Zac Clifton8mo ago
@Gali Ainouz-Baum Would you be able to point us in the right direction?
Zac Clifton
Zac Clifton8mo ago
Yes, thanks you! @todd here is where we create the index
dmulligan
dmulligan8mo ago
@todd only a few of the indexes were created to get us our of a hole, I will loop back around and check to see what others need to be created.
db.messages.createIndex({"_subscriberId":1})
db.messages.createIndex({"_environmentId":1})
db.jobs.createIndex({"_environmentId":1})
db.jobs.createIndex({"_subscriberId":1})
db.jobs.createIndex({"_organizationId":1})
db.jobs.createIndex({"_parentId":1})
db.messages.createIndex({"_subscriberId":1})
db.messages.createIndex({"_environmentId":1})
db.jobs.createIndex({"_environmentId":1})
db.jobs.createIndex({"_subscriberId":1})
db.jobs.createIndex({"_organizationId":1})
db.jobs.createIndex({"_parentId":1})
We noticed a speed increase for fetching a non-cached feed for a subscriber from around 10/15 seconds to 50-80ms.
todd
todd8mo ago
Sorry, caught up. Great work all. @Gali Ainouz-Baum @Zac Clifton thanks—I see now where this class of work is done @dmulligan excellent work basically that an index reconciliation needs to happen (code to actuals, add and possibly remove) Would these be fair statements: * the collections are provisioned as part of a bootstrap process (rather than a out of process migration) * provisioning the collections includes schemas and indexes * any changes functional (schema) or performance (index) should be included in the code base (and thus statement one then applies them for all) What I saw as a recommendation was a hand tweak for performance that would break the statements. Because the thing I wonder is why is Novu in production not experiencing what @dmulligan is reporting? 🙂