Ash query with 1 filter performs worse than the same Ecto query
We've noticed quite bad performance when querying data produced by paper_trail using Ash. The same kind of queries made with regular Ecto.Query perform much better.
The tables involved are huge (it's paper trail after all), but the column by which we filter is indexed.
Upon examining the difference between the produced queries, the only difference I see is in the WHERE clause:
Ash:
Ecto:
The
document_id
is added to the Version
resources using attributes_as_attributes([:document_id])
within paper_trail
.
Why is this casting produced in the Ash version of the query necessary? Is it something that can be avoided by better definitions and or configuration?10 Replies
what version of Ash are you using? I think Zach fixed something about that recently
https://discord.com/channels/711271361523351632/1245194918490542132/1379839226966835281
In prod we currently have ash 3.4.67 and ash_posgres 2.5.10. But I've tried locally with ash 3.5.18 and ash_postgres 2.6.6 - same SQL query is produced.
😡will fix
Question: is the performance issue related to not using an index?
nvm I can see that it is from your original question 🙂
Is
document_id
actually a bigint in your database?
So AFAIK, this should not stop the query from using an index
I did some local testing to that effect.
I removed the double query casting issue in a previous version, but it hit up on some edge cases where the type casting was doing work and so I added back in in certain cases temporarily (we're not intentionally double type-casting, but it was a cosmetic issue so better that than queries failing).
So the thing that we need to figure out at this point is:
- is the type casting actually preventing you from using the index
- if not, is the type casting actually the source of the issue
I'm working on a more targetd patch to ash_sql
to reduce the double casting, but we'll need to confirm what the cause is. i.e if you have access to production, can you run that same ecto query but like this:
Also be aware that depending on how you are testing, the first time you run any ecto query you pay a caching cost, so if you're testing locally for example by running the ash query and then running the ecto query, make sure you run the ash query and the ecto query once before comparing.
Okay, so I have a fix for the double casting. BUT: keep in mind that in general index usage is typically helped with type casts, because it solves for ambiguity around the type fo something during planning. This may be slightly outdated depending on the postgres version you are on, but generally speaking type casting into the types expected by an index at least won't hurt.
So its possible there is some other kind of performance issue 🙂 i.e maybe there is some kind of issue with loading data into Ash structs etc.
on ash_sql
version 0.2.80
that query should now be generated as
I did some benchmarking, and there is overhead to using an Ash action for doing a lookup id, but its roughly 100 usec which probably isn't worth optimizing at this point.
i.e 100usec to lookup by id for Ecto, and 200usec to lookup by id for Ash. Thats important, and there are a whole slew of code optimizations that could be made to that end, but still, 200usec is pretty fast.
@arconaut any updates on this?
The latest ash_sql
version 0.2.80
should fix the double casting, but I'd still be pretty surprised if thats your issueRun the ash query through explain/analyze and share the results.
Sorry, a bit of a busy day yesterday, didn't get the chance to come back to it. I'll try to look into this today.
I can confirm that the double-casting is gone indeed in
ash_sql
0.2.80
.
I've run EXPLAIN ANALYZE for both versions of the query, each time with a new connection (subsequent queries are indeed use cache).
The version with casting (as ash_sql 0.2.80 produces), often this even times out after 15 sec:
It looks like the index is ignored indeed when casting the argument. The document_id
is of type integer
in that table, is that perhaps the reason? We may have manually written the DB migrations for the paper_trail tables (from not knowing how to generate them back then).
If it helps:
- PostgreSQL 17.4The slow
seq scan
version is definitely missing the index. Seq scan means it's reading the entire table.
Can you share the output of \d proofreading_issues_versions
.
Might also be worth running analyze
on the proofreading_issues_versions
table. The query planner estimated rows=25315
rows would be returned when it really returned rows=10
. Analyze will update the statistics the postgres query planner uses and help it make better choices.
edit: I didn't read the bottom paragraph of your post before responding. 🤦♂️ If the db column is integer and the query is casting to bigint I think that explains the index miss. I think there are certain situations where postgres will handle this gracefully, but I don't know or understand all the use cases for it.
Initially I would have thought that because your query parameter of 791785 is within the integer range it would have been able to convert it, but obviously its missing the index. If the query parameter would have been something above the integer max value I would expect a index miss.Yes, that will be the problem 🙂
Let me look at something
Okay. So right now we are casting all integers to bigints, and I think no one cared because the migration generator also creates all integer fields as bigints. I will figure out a way for you to express on the attribute that it is a regular integer to avoid the type casting, and then you should be good to go 👌
Thanks, Zach! Sounds good!
As homework for our team - we should be more careful with integer types (maybe we need to implement some DB migrations)
Something that could also be pretty useful is for Ash to have a tool you can run to find things like this
Since we know all of the types of everything
@arconaut I think you actually have the tools to do this right now
See how that does for you
then use that in place of your integer type
Oh, interesting, thanks!