C
C#8mo ago
Kroks

❔ EFCore Sqlite Async: RAM and CPU Problem

So my code does the following: 30 Targets are being processed at once via async operations. Each second for each target I create 20 new Children of the target. So per second I have 20 * 30 new children. Each 20 children I call the "SaveAsync" method. However before I add a child to the database I want to ensure that the child does not exist yet, if it exists already I update its properties.
var existingUser = await ctx
.Children
.FirstOrDefaultAsync(x => x.RestID.Equals(scrapedUser.RestID));
var existingUser = await ctx
.Children
.FirstOrDefaultAsync(x => x.RestID.Equals(scrapedUser.RestID));
Im doing it with this code. I am assuming that EFCore loads all the children objects into memory, that leads to very high memory exposure, how can I prevent this? It should operate on Database level each time and not load all objects in RAM. It is important because I have millions on child object. Second issue is the CPU issue. Without doing the database actions, such as SaveAsync, FirstOrDefaultAsync the CPU is around 1% (normal for my Task), however with those methods after around 2mins I have a CPU over 90%.
249 Replies
Angius
Angius8mo ago
No, EF does not load the whole database into memory It generates an SQL query with a WHERE and LIMIT 1 in this case
Kroks
Kroks8mo ago
Hmmm whats the issue with the memory leaks then? Without DB usage I have consistently around 400 and with it just goes up and up I can send all operations that I do on Database side Well basically I only do these two that I just said
UltraWelfare
UltraWelfare8mo ago
Why not try profiling it to see which objects are doing the damage
Angius
Angius8mo ago
Huh, well, the only way I can see it being an issue is if maybe EF cannot translate that .Equals() You should just take a look at what query gets generated
Kroks
Kroks8mo ago
var existingUser = await ctx
.Children
.FirstOrDefaultAsync(x => x.RestID.Equals(scrapedUser.RestID));

if (existingUser != null)
{
existingUser.ScrapedDate = scrapedUser.ScrapedDate;
existingUser.Followers = scrapedUser.Followers;
existingUser.CanDM = scrapedUser.CanDM;
existingUser.StatusesCount = scrapedUser.StatusesCount;
}
else
{
ctx.Children.Add(scrapedUser);
}
var existingUser = await ctx
.Children
.FirstOrDefaultAsync(x => x.RestID.Equals(scrapedUser.RestID));

if (existingUser != null)
{
existingUser.ScrapedDate = scrapedUser.ScrapedDate;
existingUser.Followers = scrapedUser.Followers;
existingUser.CanDM = scrapedUser.CanDM;
existingUser.StatusesCount = scrapedUser.StatusesCount;
}
else
{
ctx.Children.Add(scrapedUser);
}
so this code looks fine at first glance?
Angius
Angius8mo ago
Yep
Kroks
Kroks8mo ago
so the DBSet "Children" is not loaded into memory?
Angius
Angius8mo ago
Could probably use ExecuteUpdateAsync() instead, but this is valid Nope, it should not be loaded unless you explicitly load it all
Kroks
Kroks8mo ago
I only do that + SaveChangesAsync after each 20 new children that I add to the DBSet nothing else so whats my best approach to solve this ?
Angius
Angius8mo ago
Well, at a glance, there's... nothing to solve, everything seems fine Unless you're doing something really stupid like newing up a new DbContext with every loop
Kroks
Kroks8mo ago
I dont actually, I have 30 DbContexts as otherwise I get an exception for concurrency. Per async Task that generates new childs per target one BotContext
Angius
Angius8mo ago
How do you get those contexts?
Kroks
Kroks8mo ago
using var ctx = new BotContext(_db);
using var ctx = new BotContext(_db);
or what do you mean?
Angius
Angius8mo ago
Yeah, that seems fine-ish
Kroks
Kroks8mo ago
_db is path to sqlite
Angius
Angius8mo ago
I thought you might be not disposing of them, hence the memory issues Huh
Kroks
Kroks8mo ago
I mean they stay alive, these tasks that I was speaking of do not exit for hours so this instance stays alive for hours 30 instances
Angius
Angius8mo ago
I'm out of ideas, I'm afraid Try asking in #database maybe? Link this thread there, provide a short summary
Kroks
Kroks8mo ago
Ok thanks for trying to help.
Insire
Insire8mo ago
if you dont dispose the dbcontext then each added Child/User will continue to be tracked. meaning memory usage will increase same for the updated user, since you seem to be tracking entities by default so its possible, with your current approach, to load the entire database into memory multiple times (as many times as you have semi persistent dbcontext instances) the solution is to dispose the dbcontext earlier
Kroks
Kroks8mo ago
makes sense I will defo try that what about the CPU usage? Might it be because I save too frequently? 30 Tasks save it each 20 new children so basically each second it saves 30 times or isnt that a problem
Insire
Insire8mo ago
saving and writing to database in itself isnt cpu intentive, its IO intensive but your queries can become expensive, given enough data to process e.g. by having no or outdated indezes
Kroks
Kroks8mo ago
but then I guess I have the issue that I create them too frequently. Like I would have a new context per task each second, what is like suggested?
Insire
Insire8mo ago
dbcontext is pretty cheap to create
UltraWelfare
UltraWelfare8mo ago
Or a lot of indexes...
Kroks
Kroks8mo ago
so its fine if I have it in a loop but dispose after each iteration
Insire
Insire8mo ago
it kinda depends, this seems to enter the realm of mass data processing, which is fine for ef core, it becomes troublesome, when you have to update stuff frequently consider using executeupdate or switch to something else like linq2db, that supports merge statements
Kroks
Kroks8mo ago
the updates are very very rare its just about the amount of data that is a lot i will try now with the new approach for the context to see if it fixed both issues
Insire
Insire8mo ago
my personal threshold is 1k tracked entities per dbcontext, i will try to use a new one, when i surpass that limit
Kroks
Kroks8mo ago
ok got it
Insire
Insire8mo ago
because 1k-2k entities can be processed by ef core with ms sql server in a single savechanges call in a reasonable timeframe its less for sqlite iirc this is not a limitation by ef core, but the dbdriver only supporting a limited amount of query parameters per query
Kroks
Kroks8mo ago
sadly didnt solve it like CPU gets to 40% after 5mins and ram still explodes
Insire
Insire8mo ago
its probably time to post some actual code, not just the query bit
Kroks
Kroks8mo ago
is vc possible too or no? because i cant actually share the code
Insire
Insire8mo ago
sadly not
Kroks
Kroks8mo ago
This is the critical code that does DB operations
Kroks
Kroks8mo ago
in parameterless ScrapeAsync method there is max 30 tasks happening
Insire
Insire8mo ago
the dbcontext isnt being disposed there
Kroks
Kroks8mo ago
in the method I mentioned?
Insire
Insire8mo ago
the code, you posted, yes
Kroks
Kroks8mo ago
the one in parameterless scrapeasync needs to be disposed after getting the targets from it? I thought its not as critical because from that context I just get the targets, nothing else. Its not used for anything else
Insire
Insire8mo ago
change tracking works for every entity unless you disable it change tracking is whats causing your increasing memory
Kroks
Kroks8mo ago
how do i disable it? also is this better?
var ctx = new BotContext(_db);

await ctx.Database.EnsureCreatedAsync();
var targets = await ctx.ScrapeTargets
.OrderBy(st => st.LastScraped)
.ToListAsync();

await ctx.DisposeAsync();
var ctx = new BotContext(_db);

await ctx.Database.EnsureCreatedAsync();
var targets = await ctx.ScrapeTargets
.OrderBy(st => st.LastScraped)
.ToListAsync();

await ctx.DisposeAsync();
not sure if you meant that
Insire
Insire8mo ago
you need change tracking for updates, inserts and deletes so you cant disable it
Kroks
Kroks8mo ago
bruh
Insire
Insire8mo ago
but you can reset the changetracker by creating a new dbcontext or calling a method on the changetracker itself, but thats slower in my experience than just creating a new dbcontext
Kroks
Kroks8mo ago
am i not doing it properly right now? creating new db context each time
Insire
Insire8mo ago
yes, i dont know what you changed, but nothing relevant seems to have changed in your code, after i explained your problem and why you have it
Kroks
Kroks8mo ago
I moved the using in scrapeasync with parameters down into the loop so it does not stay for long after each iteration its being disposed now or im wrong?
Insire
Insire8mo ago
you havent shared code, so i cant comment
Kroks
Kroks8mo ago
I have here? or did I miss something
SuperBrain
SuperBrain8mo ago
I honestly hope you're not making something nasty here ...
Insire
Insire8mo ago
i dont see a using there
Kroks
Kroks8mo ago
no no
Kroks
Kroks8mo ago
No description
SuperBrain
SuperBrain8mo ago
Seems like a Discord bot that scrapes stuff somewhere, and considering that you're scraping a large amount of data, I'd first like to know a bit more what you're trying to do. First, your issue is that you're creating the context within the loop. You should batch your operations. Context should be created outside the loop and used all the way until SaveChanges.
Kroks
Kroks8mo ago
I had that before
SuperBrain
SuperBrain8mo ago
Typically, you wrap your context within the using block.
Kroks
Kroks8mo ago
people told me to not
Insire
Insire8mo ago
they had a bunch of semi persistent instances before no you did not
Kroks
Kroks8mo ago
i had the contexts before the while(true) loop before the using statements
SuperBrain
SuperBrain8mo ago
Also, the fact that you're using SQLite could be part of your issues. Especially if there's multiple instances of your app/bot accessing same data/database.
Insire
Insire8mo ago
batching with ef core assumes, you create a new dbcontext per batch, anything else is pointless
SuperBrain
SuperBrain8mo ago
Because SQLite, even though it does support concurrency, it's not meant to be used for large number of connections.
Kroks
Kroks8mo ago
30 is not rlly large ig
SuperBrain
SuperBrain8mo ago
30 what?
Kroks
Kroks8mo ago
30 tasks that do DB operations
Insire
Insire8mo ago
30 dbcontext instances that stay around
SuperBrain
SuperBrain8mo ago
As long as that amounts to a single DB connection, it's not an issue.
Kroks
Kroks8mo ago
Single file yeah
SuperBrain
SuperBrain8mo ago
You're missing the point.
Kroks
Kroks8mo ago
ok tell me
SuperBrain
SuperBrain8mo ago
Number of concurrent connections on your database.
Insire
Insire8mo ago
which is another issue, but not too relevant atm
Kroks
Kroks8mo ago
single
Insire
Insire8mo ago
sqlite file db does not care about the amount of connections you use, because its single threaded internally anyway you can have 100 connections to it, but only one will write at any given time
Kroks
Kroks8mo ago
Confused right now. I got told that I should expose the contexts more frequently and not let them open for hours, right? I closed them at method end but the method ends within hours/days yeah thought so
SuperBrain
SuperBrain8mo ago
Creating the context should be outside the while loop here.
Kroks
Kroks8mo ago
Ok the issue still there, I tested with both after 2-3 mins the CPU is going nuts and RAM is just increasing and increasing in first 1-2 minutes CPU is 1-5% Without the DB operations, it stays at 1-5% but never goes up
SuperBrain
SuperBrain8mo ago
There can be a lot of reasons for both CPU and RAM usage to keep going up.
Kroks
Kroks8mo ago
yeah but its DB Operations since without them no issues
Insire
Insire8mo ago
i still suspect the change tracker being at fault. i dont see how keeping the dbcontext around will help with that
SuperBrain
SuperBrain8mo ago
Because you're batchiung stuff and it's kind of normal to keep increasing RAM usage until a context is actually disposed. What amount of RAM are we talking about?
Kroks
Kroks8mo ago
the amount just increases and increases, never finds an end. ok lets forget RAM for now but what about CPU
SuperBrain
SuperBrain8mo ago
Can you tell me what amount of RAM are we talking about, Megabytes, Gigabytes, Terabytes?
Kroks
Kroks8mo ago
ok so without DB Operations it was max 400 Megabyte, never increases as GC did its job. With DB Operations after 3mins I was at 800MB and it was just constantly increasing
SuperBrain
SuperBrain8mo ago
That's really not a lot.
Kroks
Kroks8mo ago
yeah but I plan to run the program 24/7 and since it constantly goes up it will fill the RAM within a few hours ocmpletely
SuperBrain
SuperBrain8mo ago
As long as that gets disposed after SaveChanges, that's fine.
Insire
Insire8mo ago
its not
Kroks
Kroks8mo ago
I call savechanges after each iteration in loop
Insire
Insire8mo ago
thats the issue
SuperBrain
SuperBrain8mo ago
Well, you don't know that, you're only guessing based on current behavior, which is not a lot of RAM.
Kroks
Kroks8mo ago
in the while true
UltraWelfare
UltraWelfare8mo ago
Just because you dispose an object doesn't mean it gets cleared from ram
Kroks
Kroks8mo ago
yeah im assuming it
UltraWelfare
UltraWelfare8mo ago
It will be cleared when GC kicks in
Kroks
Kroks8mo ago
dispose is usually just for unmanaged memory that the GC cant collect
UltraWelfare
UltraWelfare8mo ago
Not always...
SuperBrain
SuperBrain8mo ago
Yeah, that's part of the problem. You can call SaveChanges after the while loop and EF will batch-process everything at once.
Insire
Insire8mo ago
there is no after its a while(true)
Kroks
Kroks8mo ago
I mean there is it exits after hours via break hours / days
SuperBrain
SuperBrain8mo ago
Then use a counter and call SaveChanges after certain number of tasks / whatever have been processed.
Kroks
Kroks8mo ago
after certain number of users added?
Insire
Insire8mo ago
if it actually exits, then i say, let it run, until you confirmed memory is an issue
SuperBrain
SuperBrain8mo ago
Yeah, I agree with Insire.
Kroks
Kroks8mo ago
ok I can do that for the RAM problem. The CPU problem is like much more of a problem for me
SuperBrain
SuperBrain8mo ago
You have to test it for a longer time before making an assumption that there is an actual memory leak
Kroks
Kroks8mo ago
it takes after 5mins 90%+
SuperBrain
SuperBrain8mo ago
Then profiling your code is the way to go. You have to figure out exactly what does that. Perhaps that is an expected behavior, since you're continuously running the whole thing. So it's going to run as fast as it can, which means it will use as much of available processing power it has.
Insire
Insire8mo ago
im pretty sure you can write this a lot more efficient with ef core
Kroks
Kroks8mo ago
im new to EF so no clue tbh I freestyled this
SuperBrain
SuperBrain8mo ago
And using the .Equals() is totally wrong here. Use a normal == comparison. Unless you're comparing complex objects.
Kroks
Kroks8mo ago
doesnt matter if == or .Equals. I compare two strings
Insire
Insire8mo ago
does that actually matter for ef core?
Kroks
Kroks8mo ago
== is overriden in string does equals check
SuperBrain
SuperBrain8mo ago
Yes, because it will make it run client-side and will not fully translate to SQL, unless EF Core is clever enough to figure that out. Always use == unless there's a very specific reason to use .Equals().
Insire
Insire8mo ago
hm, i'd expect for ef core to generate a warning during runtime for that
SuperBrain
SuperBrain8mo ago
Yeah, I would expect the same, but this is SQLite, perhaps the provider does something to mitigate this particular case.
Insire
Insire8mo ago
i'll have to check that at work some time
SuperBrain
SuperBrain8mo ago
.Equals() is typically used to compare complex objects, not simple data types like strings.
Insire
Insire8mo ago
sure
SuperBrain
SuperBrain8mo ago
Yes, you can override it to do whatever you want, but that's not the point here. What exactly do you mean by this?
Kroks
Kroks8mo ago
string type overrides == operator
SuperBrain
SuperBrain8mo ago
So?
Kroks
Kroks8mo ago
why would it matter if == or not same operation
SuperBrain
SuperBrain8mo ago
What matters is the .Equals(). EF Core may not be able to properly translate it to SQL expression that gets ran on the SQLite.
Kroks
Kroks8mo ago
ah yeah im dumb its no evaluating on C# side but generates queries changed it now 👍🏻
SuperBrain
SuperBrain8mo ago
Exactly. It's called "server-side vs client-side evaluation" Speaking of connection lifetime, since this is SQLite, you don't really have to worry about it.
Insire
Insire8mo ago
unless its the inmemory variant, then you do
Kroks
Kroks8mo ago
i will do tests and let you guys know the results
SuperBrain
SuperBrain8mo ago
using (var ctx = new DbContext())
{
int counter = 0
while (...)
{
// Do hour processing here
if (counter == 100)
{
await ctx.SaveChangesAsync();
counter = 0;
}
}
}
using (var ctx = new DbContext())
{
int counter = 0
while (...)
{
// Do hour processing here
if (counter == 100)
{
await ctx.SaveChangesAsync();
counter = 0;
}
}
}
This is one simple way to batch 100 tasks at a time. GC should keep up over time. You may see an increased memory usage, but it should not go beyond certain threshold. There are ways to for GC to run more frequently or even at specific times. There's also a project directive which changes this behavior.
Insire
Insire8mo ago
once the entire db is in memory, the memory usage increase will stop :P
Angius
Angius8mo ago
You could also not use the change tracker, and instead of loading the entry, editing it, saving changes... use .ExecuteUpdateAsync()
SuperBrain
SuperBrain8mo ago
Try adding
<ServerGarbageCollection>false</ServerGarbageCollection>
<ServerGarbageCollection>false</ServerGarbageCollection>
in your .csproj file, in the top ProjectGroup section.
Angius
Angius8mo ago
If the EF version used allows
Insire
Insire8mo ago
they do inserts aswell, so they are not getting away from the changetracker and inserts seem to be majority of actions
Kroks
Kroks8mo ago
i will try this thanks yeah should I dispose DbContext too and create a new one? because currently my contexts are open for hours before disposal around 30 contexts
SuperBrain
SuperBrain8mo ago
Are you saying that you have 30 contexts running in parallel? And to answer your question, no, you should not dispose of DbContexts, that's why you're wrapping it in using() block, to have that done for you. Best I can say right now - I think you have to rethink your whole workflow. And before you do that, try to lear a bit more about a few other things - How SQLite works - How EF Core DbContext works - DbContext lifetime - How things work when using await / async
Kroks
Kroks8mo ago
for async things im pretty good at it already. But yeah for EF im missing exp i will solve it and post solution here
var existingUser = await ctx
.ScrapedUsers
.FirstOrDefaultAsync(x => x.RestID == scrapedUser.RestID);

if (existingUser != null)
{
existingUser.ScrapedDate = scrapedUser.ScrapedDate;
existingUser.Followers = scrapedUser.Followers;
existingUser.CanDM = scrapedUser.CanDM;
existingUser.StatusesCount = scrapedUser.StatusesCount;
}
else
{
ctx.ScrapedUsers.Add(scrapedUser);
}
var existingUser = await ctx
.ScrapedUsers
.FirstOrDefaultAsync(x => x.RestID == scrapedUser.RestID);

if (existingUser != null)
{
existingUser.ScrapedDate = scrapedUser.ScrapedDate;
existingUser.Followers = scrapedUser.Followers;
existingUser.CanDM = scrapedUser.CanDM;
existingUser.StatusesCount = scrapedUser.StatusesCount;
}
else
{
ctx.ScrapedUsers.Add(scrapedUser);
}
this is the critical part even without savechangesasync it does go up to 100% cpu after a while
SuperBrain
SuperBrain8mo ago
Ideally, if you can, share a GitHub repo for your project. That way we can have a better picture of what's going on in your entire app, because, the fact taht you're experiencing this when adding database operations still doesn't mean it's EF Core's issue.
Kroks
Kroks8mo ago
I would but I cannot as its NDA partially if you are down, im open to VC
Insire
Insire8mo ago
thats the part where change tracking happens
Kroks
Kroks8mo ago
yeah. So I excluded basically everything, even savechanges. its this part that causes trouble
Insire
Insire8mo ago
if (existingUser != null) thats the only line where nothing is tracked
Kroks
Kroks8mo ago
how does it track property change? im curious? Did it inject some kind of CIL Code into the setter method?
SuperBrain
SuperBrain8mo ago
The moment you get a reference to existingUser it becomes tracked. Change tracking is the default behavior of EFCore.
Kroks
Kroks8mo ago
yeah I got that but I just want to know how it tracks the changes internally not that its important here not really
SuperBrain
SuperBrain8mo ago
For that, you have to read the docs.
Kroks
Kroks8mo ago
just curious kk yeah so this is fucked but i dunno how and why how can I see sql queries generates from efcore? maybe its doing dumb stuff
Kroks
Kroks8mo ago
ok just tested, even without the if(existingUser != null) case its still going nuts so its this var existingUser = await ctx .ScrapedUsers .FirstOrDefaultAsync(x => x.RestID == scrapedUser.RestID); or ctx.ScrapedUsers.Add(scrapedUser); for CPU im guessing first
UltraWelfare
UltraWelfare8mo ago
Add does nothing particularly useful all it does it change the entry state to "Added"
Kroks
Kroks8mo ago
so its this var existingUser = await ctx .ScrapedUsers .FirstOrDefaultAsync(x => x.RestID == scrapedUser.RestID);
UltraWelfare
UltraWelfare8mo ago
(i mean its still useful, just not important in terms of cpu time)
Insire
Insire8mo ago
well, it also needs to check whether that entity is already being tracked
UltraWelfare
UltraWelfare8mo ago
wouldn't account for a 90% cpu jump 😆 all it does it looks at the primary key if its set no database roundtrips
Insire
Insire8mo ago
is that an assumption, or did you actually go and look at the code?
UltraWelfare
UltraWelfare8mo ago
It says it in the docs, give me a minute
Kroks
Kroks8mo ago
its like after 5min
Insire
Insire8mo ago
because that explanation does not explain the error ef core throws, when you try to track another entity with the same primary key
Kroks
Kroks8mo ago
in one minute it goes like 3% then 11% then 9% then 3% but after sometime it just explodes
UltraWelfare
UltraWelfare8mo ago
just hotspot profile your application with a snapshot
Kroks
Kroks8mo ago
i will check, never done profiling tbh
UltraWelfare
UltraWelfare8mo ago
DbContext.Add and DbSet<TEntity>.Add do not normally access the database, since these methods inherently just start tracking entities. However, some forms of value generation may access the database in order to generate a key value. The only value generator that does this and ships with EF Core is HiLoValueGenerator<TValue>. Using this generator is uncommon; it is never configured by default. This means that the vast majority of applications should use Add, and not AddAsync. Other similar methods like Update, Attach, and Remove do not have async overloads because they never generate new key values, and hence never need to access the database.
Insire
Insire8mo ago
thats not really an answer to my question. it says start tracking, it does not confirm, whether they only check the primary key for it holding the default value
UltraWelfare
UltraWelfare8mo ago
It doesn't you can easily test it if you create a new model, set an id to the existing one and "add" then "savechangesasync" it will throw exceptions at you (at the savechangesasync line)
Insire
Insire8mo ago
i know it does. but you claimed otherwise before and i was asking whether that was an assumption, or if you did look at the code
UltraWelfare
UltraWelfare8mo ago
I didn't claim otherwise..? I said it shouldn't account for a 90% cpu jump because it literally doesn't check the database
Insire
Insire8mo ago
idk, it sounds like you did
No description
Insire
Insire8mo ago
especially the last part does
UltraWelfare
UltraWelfare8mo ago
i meant it looks inside the class not inside the database my bad
Insire
Insire8mo ago
i mean sure, db lookups are IO work, not CPU work
UltraWelfare
UltraWelfare8mo ago
it looks if the property that you have set as a key (with fluent api or annotation) has a value or not anyways at the end of the day it doesn't do DB Trips
Insire
Insire8mo ago
in general it doesnt, you are right. there are exceptions tho
UltraWelfare
UltraWelfare8mo ago
add just checks inside its change tracker dictionary if there's another type with the same id (this throws an exception), otherwise it marks the object as "Added" and will be inserted at "SaveChangesAsync()" however
Insire
Insire8mo ago
but i digress
UltraWelfare
UltraWelfare8mo ago
if you do this:
var item = await ctx.Whatever.SingleAsync(...);

var item2 = new Item();
item2.Id = 2;

ctx.Whatever.Add(item2);
var item = await ctx.Whatever.SingleAsync(...);

var item2 = new Item();
item2.Id = 2;

ctx.Whatever.Add(item2);
if the item and item2 coincidentally end up with the same ID its gonna throw
Insire
Insire8mo ago
yea
UltraWelfare
UltraWelfare8mo ago
having a single context for him isn't a big issue however yeah ofc the RAM will keep growing the change-tracker is gonna grow as long as theres new "users" (or whatever) being added I'd definitely change ScrapeFollowersFromEntries to not receive a dbcontext, and instead create one inside then all I can recommend is hotspot profiling
Insire
Insire8mo ago
thats what i have been saying, but ¯\_(ツ)_/¯ create and dispose a dbcontext every 1k entries being added
UltraWelfare
UltraWelfare8mo ago
why not every function call? its still gonna throw exceptions
Insire
Insire8mo ago
because while dbcontext creation is cheap, its still not free
UltraWelfare
UltraWelfare8mo ago
at this point if you got that much function calls, just switch to plain SQL with Dapper
Insire
Insire8mo ago
i dont think dapper is an alternative to ef core
UltraWelfare
UltraWelfare8mo ago
or execute plainsql with efcore and forget about change-tracker
Insire
Insire8mo ago
dapper is a micro orm, not a orm if you wanted a micro orm, you wouldnt choose ef core
Kroks
Kroks8mo ago
idk man I got like 3 diff opinions on this. Some said its fine to let it just dispose after the loop that takes hours /days and you say I should dispose it after each 1k
Insire
Insire8mo ago
¯\_(ツ)_/¯
UltraWelfare
UltraWelfare8mo ago
there's no answer, you have to test it it depends on a lot of factors
Insire
Insire8mo ago
you get to choose
UltraWelfare
UltraWelfare8mo ago
if you keep 1, you gonna have all the objects in memory (not instantly, but as you query and add items, they're kept in memory)
Insire
Insire8mo ago
all i know is, that i have been doing pretty much exactly what you are doing for the past 1-2 years at work, and i am pretty confident that i know the limits of ef core in this regard by know
Kroks
Kroks8mo ago
its the firstordefault
Insire
Insire8mo ago
and if you actually want a replacement for ef core, then you can look at something like linq2db for mass updates and inserts - athough i just recently started using that, so im not an expert there
Kroks
Kroks8mo ago
i love efcore but this kinda got annoying
UltraWelfare
UltraWelfare8mo ago
do you still have the equals or did you change it?
Kroks
Kroks8mo ago
changed var existingUser = ctx .ScrapedUsers .FirstOrDefault(x => x.RestID == scrapedUser.RestID);
UltraWelfare
UltraWelfare8mo ago
otherwise, is your sqlite table optimized? is the RestID indexed?
Kroks
Kroks8mo ago
can you elaborate? what does it mean to be indexed
Insire
Insire8mo ago
did you add an index to the database column, for that table in sqlite
UltraWelfare
UltraWelfare8mo ago
EXPLAIN QUERY PLAN SELECT * FROM ScrapedUsers WHERE RestId = ?; run this manually with a valid restid and show the output
Kroks
Kroks8mo ago
sqlite creates the table for me efcore* bruh
Insire
Insire8mo ago
you can still define indezes
UltraWelfare
UltraWelfare8mo ago
basically if you dont have an index and you request from an sql an object it will csan through the whole table to find what you want a "index" turns into a hashmap (not exactly) which is way faster
Kroks
Kroks8mo ago
ah bro that might be the issue then
UltraWelfare
UltraWelfare8mo ago
yeah you're putting too much pressure on SQLite
Kroks
Kroks8mo ago
No description
Kroks
Kroks8mo ago
this seems fine then now?
UltraWelfare
UltraWelfare8mo ago
yeah
Insire
Insire8mo ago
is the RestID the primary key for ScrapedUser? if so, its already indexed
Kroks
Kroks8mo ago
its not actually I should have done that as primary key as it should be unique
UltraWelfare
UltraWelfare8mo ago
that's another discussion lol
Kroks
Kroks8mo ago
yeah better to have a separate PK
UltraWelfare
UltraWelfare8mo ago
anyways, just add the index for now just to test if it's going to work
Insire
Insire8mo ago
yea, just because a column should only contain unique values, does not mean it has to be a primary key column (although its a strong indicator)
Kroks
Kroks8mo ago
testing rn after 5min 5% cpu thats very good will keep it running cpu is fixed now checking RAM
UltraWelfare
UltraWelfare8mo ago
so less of an ef-core issue and more of an database table optimization issue 😆 i'd advice you to read on those topics, as they are very important 😄 also congrats on learning how to profile there's also profiling for RAM usage.. to see which objects take memory
Kroks
Kroks8mo ago
yeah right, missing exp from my side lol but guys im somehow still leaking memory
Kroks
Kroks8mo ago
even though im disposing the ctx after each 10th iteration what can be the issue? like after 1hr approx its 5GB and it increases Also I get a unique constrained failed on RestID here ?
var existingUser = await ctx
.ScrapedUsers
.FirstOrDefaultAsync(x => x.RestID == scrapedUser.RestID);

if (existingUser != null)
{
existingUser.ScrapedDate = scrapedUser.ScrapedDate;
existingUser.Followers = scrapedUser.Followers;
existingUser.CanDM = scrapedUser.CanDM;
existingUser.StatusesCount = scrapedUser.StatusesCount;
}
else
ctx.ScrapedUsers.Add(scrapedUser);
var existingUser = await ctx
.ScrapedUsers
.FirstOrDefaultAsync(x => x.RestID == scrapedUser.RestID);

if (existingUser != null)
{
existingUser.ScrapedDate = scrapedUser.ScrapedDate;
existingUser.Followers = scrapedUser.Followers;
existingUser.CanDM = scrapedUser.CanDM;
existingUser.StatusesCount = scrapedUser.StatusesCount;
}
else
ctx.ScrapedUsers.Add(scrapedUser);
how is it even possible
Insire
Insire8mo ago
do you still have 30 dbcontext instances?
Kroks
Kroks8mo ago
yeah
Insire
Insire8mo ago
then how do you make sure that they dont try to insert a new user for the same id?
Kroks
Kroks8mo ago
i need them due to concurrency otherwise exception hm fuk
Insire
Insire8mo ago
yes
Kroks
Kroks8mo ago
do I need to semaphore lock? this region
Insire
Insire8mo ago
you could, but you'll also kill what ever advtange you gained by having 30 dbcontext instances
Kroks
Kroks8mo ago
i need them otherwise I get exceptions concurrency exceptions
Insire
Insire8mo ago
then maybe have one dbcontext instead then you dont need locks
Kroks
Kroks8mo ago
but then I have ram issue as I cannot dispose it
Insire
Insire8mo ago
? one at a time
Kroks
Kroks8mo ago
yeah
Kroks
Kroks8mo ago
No description
Kroks
Kroks8mo ago
No description
Kroks
Kroks8mo ago
still even though I made a simple lock here
Insire
Insire8mo ago
or, well you could just accept that, and add retry logic
Kroks
Kroks8mo ago
nah im not that dirty
Insire
Insire8mo ago
choices 💜
Kroks
Kroks8mo ago
efcore gives me cancer slowly
Insire
Insire8mo ago
thats how it is for any framework, that you dont know you'll have the same problem with any relational database and maybe nosql databases too
Kroks
Kroks8mo ago
this seems so illogical to me xd ok I understodd the problem I guess. In another context I add the scrapeduser but I didnt save it yet meaning in another context I cannot find the entry as its not in the db and not in the context itself yet so yeah one context is the way+
Insire
Insire8mo ago
imagine having 2 dbcontext instances at the same time, both are asked for the same user id, both say they aint have it then you try to write both of them to the db i said, that only one thread writes at any time to the db so on write succeeds the other, then fails, because then the user already exists in the db pretty simple imo
Kroks
Kroks8mo ago
yeah i solved it this explains it basically
Insire
Insire8mo ago
and the statement with the thread, dont take it too seriously. there is still synchronization, but not necessarily on the same thread
Kroks
Kroks8mo ago
not after 18hours work my brain is on 30%
Insire
Insire8mo ago
¯\_(ツ)_/¯ take a break get some sleep
Kroks
Kroks8mo ago
yeah now since it works haha
Accord
Accord8mo ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.