"Metadata version not found for workspace" after moving my workspace/database to a new environment.

A few weeks ago i migrated old workspace data to a new database with a new workspace because our metadata was "broken" on the old workspace. The migration was done by manually copying the data from old database to new database by hand. The migration went well and even checked with @Prastoinif everything was working. This migration was done locally on my laptop and worked perfectly, than we migrated the database data to our server by making a psql dump on my laptop and restoring on the server. The docker setup is exactly the same on both my local laptop and server. But now i am getting these Metadata version not found for workspace errors. I used the commands yarn command:prod workspace:sync-metadata and yarn command:prod cache:flush in both my twenty-server and twenty-worker docker. It fixes the errors for about 2 days and than i get the error again. I have not restarted the dockers.
No description
43 Replies
Prastoin
Prastoin5mo ago
Hey @Caspersonn, on what Twenty's version are you exactly please ? That's weird because this is a cache warmup issue, this should be fixed by the cache:flush/sync-metadata command
Caspersonn
CaspersonnOP5mo ago
Hey @Prastoin. We are currently using version v0.51.13 and double checked in the core.workspace.
Prastoin
Prastoin5mo ago
The issue would rather come from your Redis instance rather than the database
double checked in the core.workspace.
I assume workspaceId does exist in your db
Caspersonn
CaspersonnOP5mo ago
Yes i know. I ran this command redis-cli --scan --pattern '*' in my redis container to check if the keys are there. And it is there.
No description
Prastoin
Prastoin5mo ago
Is that all ? we should have an engine:workspace:metadata:workspace-metadata-version:$workspaceId We're gonna run a command that programmatically sets the increment metadata version, but this is buggy About to share a command that does the job in 0.51.12
Caspersonn
CaspersonnOP5mo ago
Okay, thanks!
Prastoin
Prastoin5mo ago
My bad it's already in there, 3th from the end On what kind of operation does this error occur ?
Caspersonn
CaspersonnOP5mo ago
When the users want to login. Then they get either a Authentication failed error or metadata version not found for workspace
Caspersonn
CaspersonnOP5mo ago
Oops removed the message, these are the keys in the redis cache.
Prastoin
Prastoin5mo ago
Could read the "engine:workspace:metadata:workspace-metadata-version:d7858828-5f20-430a-ab76-e7555779126a" value ?
Caspersonn
CaspersonnOP5mo ago
Sorry, what do you exactly mean?
Prastoin
Prastoin5mo ago
See what value is actually stored in cache Should be a number
Caspersonn
CaspersonnOP5mo ago
Ooh yes, i am not a redis expert what command should i run?
Prastoin
Prastoin5mo ago
If it's 0 JavaScript might not like it 🙂 Something like GET key from your redis connection
Caspersonn
CaspersonnOP5mo ago
It's not 0...
No description
Prastoin
Prastoin5mo ago
Does this fit your in-database core.workspace.metadataVersion value ?
Caspersonn
CaspersonnOP5mo ago
Jep
No description
Prastoin
Prastoin5mo ago
Are we sure your twenty-server instance is correctly connected to your redis instance ? Problem does not seem to come from redis instance itself maybe a networking issue Maybe the server is just not able to hit or is not hitting the correct redis instance 🤔 Created this https://github.com/twentyhq/twenty/pull/11829 just for the sake of the potential 0 version value Coucou@Weiko ! Would you mind giving this thread and PR a look when you have some free time please
Caspersonn
CaspersonnOP5mo ago
The redis port is forwarded, there is only running 1 redis instance in that specific server.
No description
No description
Caspersonn
CaspersonnOP5mo ago
So i don't think it's a network issue
Prastoin
Prastoin5mo ago
I'm not aware of any fixes after 0.51.12 about such a behavior, but I would prefer upgrading to latest 0.51 patch in the first place than to the 0.52.5 if you're willing to give it a try I can't find any occurrences in our cloud env neither
Caspersonn
CaspersonnOP5mo ago
Hmhmhm, the latest patch of 0.51 is v0.51.14 and we are running v0.51.13. I was to planning to upgrade in the next week to v0.52.5. But our twenty instance has to be restarted if i upgrade to the latest patch and that is not really possible right now... Are there any big changes in the next v0.52 version?
Prastoin
Prastoin5mo ago
No worries that's ok No huge feature, but fixes and quality enhancement Could you confirm that the metadata version not found error occurs each time you wanna access to twenty or only once ? -> Are you able to access your twenty's instance ?
Caspersonn
CaspersonnOP5mo ago
Okay. Il explain, we cannot login in to the twenty instance until i run the commands yarn command:prod workspace:sync-metadata and yarn command:prod cache:flush. It just gives a authentication error if i try over and over again. And as i said i can access the twenty instance after running the commands. To clarify with accessing i mean logging in.
Prastoin
Prastoin5mo ago
Getting the error once after a cache flush is "normal" It will warmup the cache and retrying will succeed, normally even without a sync-metadata. If not it means your workspace needed to get synched We don't prevent such a use case as we prefer a single fail rather then several cache write at the same time
Caspersonn
CaspersonnOP5mo ago
Okay, thanks for explaining. What's the best thing to do right now?
Prastoin
Prastoin5mo ago
From my understanding you should not have the error anymore ?
Caspersonn
CaspersonnOP5mo ago
Jep, you are right. But it comes back after a couple of days, so that's the weird thing.
Prastoin
Prastoin5mo ago
Do you have any custom eviction policy on your redis instance ?
Caspersonn
CaspersonnOP5mo ago
Nope. It's the default redis instance from the docker.
Prastoin
Prastoin5mo ago
Mhhm that's indeed weird Is it like after a weekend ? I'm trying to find a link between traffic and cache hydration Will need to investigate further
Caspersonn
CaspersonnOP5mo ago
It's between 48 and 72 hours. Than the error is back @Prastoin Maybe we could call, next week or so?
Prastoin
Prastoin5mo ago
I'm not sure that planning a call would be the most worth it for both of us, about to dig deeper regarding our cache eviction management might not be the most suitable policy. I would think of it to be noeviction
Caspersonn
CaspersonnOP5mo ago
Yeah fair, and goodluck. 🙃
Prastoin
Prastoin5mo ago
@Caspersonn for some reason your redis instance may have to apply its default eviction policy, which seems to currently be volatile-lru( I've just changed in abc05fafd7757d ) What I would do is track your redis used_memory over maxmemory ( or check history not sure if such a thing exists natively in redis )
Caspersonn
CaspersonnOP5mo ago
Okay. I have applied your new redis config and it seems to work for now. Il keep track of the memory and keep you up to date 🙃
Prastoin
Prastoin5mo ago
Sounds great, do not hesitate reaching out if you encounter any troubles Please be aware that if it was the issue, it means you should upgrade your redis container to have more cpu If not with the noEviction rule, you might encounter cache write runtime errors
Caspersonn
CaspersonnOP4mo ago
Hey @prastoin. The redis is still removing their keys after 2 or so days and i cannot figure out why.
Prastoin
Prastoin4mo ago
Hey @Caspersonn do you have access to your redis instance memory usage metrics across time ?
Caspersonn
CaspersonnOP4mo ago
No only the live stats with docker stats I also set these commands --maxmemory-policy noeviction --appendonly yes, that i found online as a solution but didn't work either. Also ran info memory
127.0.0.1:6379> info memory
# Memory
used_memory:5589128
used_memory_human:5.33M
used_memory_rss:17285120
used_memory_rss_human:16.48M
used_memory_peak:9498568
used_memory_peak_human:9.06M
used_memory_peak_perc:58.84%
used_memory_overhead:1269979
used_memory_startup:946288
used_memory_dataset:4319149
used_memory_dataset_perc:93.03%
allocator_allocated:8825256
allocator_active:9232384
allocator_resident:12398592
allocator_muzzy:0
total_system_memory:4063776768
total_system_memory_human:3.78G
used_memory_lua:239616
used_memory_vm_eval:239616
used_memory_lua_human:234.00K
used_memory_scripts_eval:89088
number_of_cached_scripts:8
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:272384
used_memory_vm_total_human:266.00K
used_memory_functions:1024
used_memory_scripts:90112
used_memory_scripts_human:88.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.05
allocator_frag_bytes:277240
allocator_rss_ratio:1.34
allocator_rss_bytes:3166208
rss_overhead_ratio:1.39
rss_overhead_bytes:4886528
mem_fragmentation_ratio:3.10
mem_fragmentation_bytes:11716664
mem_not_counted_for_evict:3072
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_clients_slaves:0
mem_clients_normal:227395
mem_cluster_links:0
mem_aof_buffer:3072
mem_allocator:jemalloc-5.3.0
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:1496
127.0.0.1:6379> info memory
# Memory
used_memory:5589128
used_memory_human:5.33M
used_memory_rss:17285120
used_memory_rss_human:16.48M
used_memory_peak:9498568
used_memory_peak_human:9.06M
used_memory_peak_perc:58.84%
used_memory_overhead:1269979
used_memory_startup:946288
used_memory_dataset:4319149
used_memory_dataset_perc:93.03%
allocator_allocated:8825256
allocator_active:9232384
allocator_resident:12398592
allocator_muzzy:0
total_system_memory:4063776768
total_system_memory_human:3.78G
used_memory_lua:239616
used_memory_vm_eval:239616
used_memory_lua_human:234.00K
used_memory_scripts_eval:89088
number_of_cached_scripts:8
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:272384
used_memory_vm_total_human:266.00K
used_memory_functions:1024
used_memory_scripts:90112
used_memory_scripts_human:88.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.05
allocator_frag_bytes:277240
allocator_rss_ratio:1.34
allocator_rss_bytes:3166208
rss_overhead_ratio:1.39
rss_overhead_bytes:4886528
mem_fragmentation_ratio:3.10
mem_fragmentation_bytes:11716664
mem_not_counted_for_evict:3072
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_clients_slaves:0
mem_clients_normal:227395
mem_cluster_links:0
mem_aof_buffer:3072
mem_allocator:jemalloc-5.3.0
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:1496
if that helps :D
Prastoin
Prastoin4mo ago
Hey @Caspersonn, we've been discussing the way we throw errors after a cache miss, there's already a lock whenever a cache computation is done for a given key to avoid write race condition/concurrencies issues We will be enabling cache computation on fail in the next sprint ( starting next monday ) Which means it should not hard fail for your use case anymore But still your Redis instance seems to be facing cpu oom issue 🤔
Caspersonn
CaspersonnOP4mo ago
Hey @prastoin. Didn't see you message, but okay good to know. From which version on should this fix be available? We upgraded our non-prod environment to v0.54.6 and it is still facing the same issue. The cpu oom issue could be it, our machine only has 4gb of ram and 2 cpu cores so that could explain the random deletion of keys. What do you guys recommend for how much ram and cpu cores? Thanks for your help!
Prastoin
Prastoin4mo ago
Hey @Caspersonn, unfortunately I can't give an exact date from now on Regarding the 0.54.6 version not surprise as we haven't touch the redis configuration lately This might be the root cause indeed, idk if you could get the cpu usage metric through time in order to assert this

Did you find this page helpful?