Best approach to fetch hundreds of thousands of records
I'm wondering what's the best approach or best practice around fetching hundreds of thousands of records for initial full data loads to transfer somewhere else?
Currently, I'm using jpa + pagination and it is slow. I'm loading around ~400k objects in memory but is the better approach to use jdbctemplate? What if you also need to join a few tables?
What would you recommend here?
25 Replies
⌛ This post has been reserved for your question.
Hey @Kale Vivi! Please useTIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here./closeor theClose Postbutton above when your problem is solved. Please remember to follow the help guidelines. This post will be automatically marked as dormant after 300 minutes of inactivity.
💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived.
If your question was not answered yet, feel free to re-open this post or create a new one.
In case your post is not getting any attention, you can try to use /help ping.
Warning: abusing this will result in moderative actions taken against you.JDBCTemplate might be faster but it depends on how you are doing it and the exact issues
Can you profile your application? Is most of the work done in the DB? IO? The application/processing?
For profiling, you could use something like VisualVM
💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived.
If your question was not answered yet, feel free to re-open this post or create a new one.
In case your post is not getting any attention, you can try to use /help ping.
Warning: abusing this will result in moderative actions taken against you.
Particularly on this area: SomeObject::from, I'm running into a memory issue loading all the objects.
Is there a more performant way to do this operation here?
DO you really need everything as a
List?Do you mean if I should use a stream instead?
At the some point I need a list but maybe I can have a stream until that point right?
i mean technically you could save it all in a file
and then write another code that reads that file
idfk if thats what u want tho
It isn't
It needs to read from db
then fetch it with a stream
💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived.
If your question was not answered yet, feel free to re-open this post or create a new one.
In case your post is not getting any attention, you can try to use /help ping.
Warning: abusing this will result in moderative actions taken against you.One other thing around this do you have advice on debugging long running operations? Like if you have to wait 2 hours to reproduce an issue, that is too long. Do you normally have to adjust the code to build a smaller sample that reflects only that object to try to reprodue and debug or any other tips here around that?
i mean
if you wanna test some scheduler
you can just set up a new one or something
that calls every second
or if you are processing a huge file just put a small one for testing
Why?
Having a
List requires storing all in memory
ideally you would process them in batches and never store all in memoryIt depends, we use jdbctemplate to fetch about 19M rows, what is the problem you are trying to solve? honestly if it works, it works
if you have the HW might as well put it to work
if you have lot of duplicate strings you might want to look into string interning, saved me lot of memory without having to do much
well they store everything in a
List at the end...What do you mean by duplicate strings here?
I need to eventually send the data to an external api and api contract requires objects be in an array/list, etc..
maybe just make a big page size like 10k
if you have to send it in a list anyway
💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived.
If your question was not answered yet, feel free to re-open this post or create a new one.
In case your post is not getting any attention, you can try to use /help ping.
Warning: abusing this will result in moderative actions taken against you.Do you really need to send all of the data in a single request?
Really dude?
💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived.
If your question was not answered yet, feel free to re-open this post or create a new one.
In case your post is not getting any attention, you can try to use /help ping.
Warning: abusing this will result in moderative actions taken against you.Not in a single request, I take the list and make calls in batches async
then you take each batch from the DB, convert that batch to a list and then make the call for that specific batch
don't create a list for all of the data
Makes sense, thank you
If you are finished with your post, please close it.
If you are not, please ignore this message.
Note that you will not be able to send further messages here after this post have been closed but you will be able to create new posts.
💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived.
If your question was not answered yet, feel free to re-open this post or create a new one.
In case your post is not getting any attention, you can try to use /help ping.
Warning: abusing this will result in moderative actions taken against you.
💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived.
If your question was not answered yet, feel free to re-open this post or create a new one.
In case your post is not getting any attention, you can try to use /help ping.
Warning: abusing this will result in moderative actions taken against you.