Ash query with 1 filter performs worse than the same Ecto query

We've noticed quite bad performance when querying data produced by paper_trail using Ash. The same kind of queries made with regular Ecto.Query perform much better. The tables involved are huge (it's paper trail after all), but the column by which we filter is indexed. Upon examining the difference between the produced queries, the only difference I see is in the WHERE clause: Ash:
OurApp.Resource.Version
|> Ash.Query.new()
|> Ash.Query.filter(query, document_id: document_id)
|> Ash.read!()
OurApp.Resource.Version
|> Ash.Query.new()
|> Ash.Query.filter(query, document_id: document_id)
|> Ash.read!()
... WHERE (d0."document_id"::bigint::bigint = $1::bigint::bigint) [12345]
... WHERE (d0."document_id"::bigint::bigint = $1::bigint::bigint) [12345]
Ecto:
Repo.all(from(s in OurApp.Resource.Version, where: s.document_id == ^document_id))
Repo.all(from(s in OurApp.Resource.Version, where: s.document_id == ^document_id))
... WHERE (d0."document_id" = $1) [12345]
... WHERE (d0."document_id" = $1) [12345]
The document_id is added to the Version resources using attributes_as_attributes([:document_id]) within paper_trail. Why is this casting produced in the Ash version of the query necessary? Is it something that can be avoided by better definitions and or configuration?
10 Replies
Rebecca Le
Rebecca Le4mo ago
what version of Ash are you using? I think Zach fixed something about that recently https://discord.com/channels/711271361523351632/1245194918490542132/1379839226966835281
arconaut
arconautOP4mo ago
In prod we currently have ash 3.4.67 and ash_posgres 2.5.10. But I've tried locally with ash 3.5.18 and ash_postgres 2.6.6 - same SQL query is produced.
ZachDaniel
ZachDaniel4mo ago
😡will fix Question: is the performance issue related to not using an index? nvm I can see that it is from your original question 🙂 Is document_id actually a bigint in your database? So AFAIK, this should not stop the query from using an index I did some local testing to that effect. I removed the double query casting issue in a previous version, but it hit up on some edge cases where the type casting was doing work and so I added back in in certain cases temporarily (we're not intentionally double type-casting, but it was a cosmetic issue so better that than queries failing). So the thing that we need to figure out at this point is: - is the type casting actually preventing you from using the index - if not, is the type casting actually the source of the issue I'm working on a more targetd patch to ash_sql to reduce the double casting, but we'll need to confirm what the cause is. i.e if you have access to production, can you run that same ecto query but like this:
Repo.all(from(s in OurApp.Resource.Version,
where: fragment("?::bigint::bigint", [s.document_id]) == fragment("?::bigint::bigint", [^document_id]))
Repo.all(from(s in OurApp.Resource.Version,
where: fragment("?::bigint::bigint", [s.document_id]) == fragment("?::bigint::bigint", [^document_id]))
Also be aware that depending on how you are testing, the first time you run any ecto query you pay a caching cost, so if you're testing locally for example by running the ash query and then running the ecto query, make sure you run the ash query and the ecto query once before comparing. Okay, so I have a fix for the double casting. BUT: keep in mind that in general index usage is typically helped with type casts, because it solves for ambiguity around the type fo something during planning. This may be slightly outdated depending on the postgres version you are on, but generally speaking type casting into the types expected by an index at least won't hurt. So its possible there is some other kind of performance issue 🙂 i.e maybe there is some kind of issue with loading data into Ash structs etc. on ash_sql version 0.2.80 that query should now be generated as
WHERE (d0."document_id"::bigint = $1::bigint) [12345]
WHERE (d0."document_id"::bigint = $1::bigint) [12345]
I did some benchmarking, and there is overhead to using an Ash action for doing a lookup id, but its roughly 100 usec which probably isn't worth optimizing at this point. i.e 100usec to lookup by id for Ecto, and 200usec to lookup by id for Ash. Thats important, and there are a whole slew of code optimizations that could be made to that end, but still, 200usec is pretty fast. @arconaut any updates on this? The latest ash_sql version 0.2.80 should fix the double casting, but I'd still be pretty surprised if thats your issue
Failz
Failz4mo ago
Run the ash query through explain/analyze and share the results.
arconaut
arconautOP4mo ago
Sorry, a bit of a busy day yesterday, didn't get the chance to come back to it. I'll try to look into this today. I can confirm that the double-casting is gone indeed in ash_sql 0.2.80. I've run EXPLAIN ANALYZE for both versions of the query, each time with a new connection (subsequent queries are indeed use cache). The version with casting (as ash_sql 0.2.80 produces), often this even times out after 15 sec:
Gather (cost=1000.00..1085496.95 rows=60757 width=616) (actual time=6635.784..9291.554 rows=20 loops=1)
Workers Planned: 2
Workers Launched: 1
-> Parallel Seq Scan on proofreading_issues_versions p0 (cost=0.00..1078421.25 rows=25315 width=616) (actual time=7942.632..9267.804 rows=10 loops=2)
Filter: ((document_id)::bigint = '791785'::bigint)
Rows Removed by Filter: 5861568
Planning Time: 0.253 ms
Execution Time: 9291.610 ms
Gather (cost=1000.00..1085496.95 rows=60757 width=616) (actual time=6635.784..9291.554 rows=20 loops=1)
Workers Planned: 2
Workers Launched: 1
-> Parallel Seq Scan on proofreading_issues_versions p0 (cost=0.00..1078421.25 rows=25315 width=616) (actual time=7942.632..9267.804 rows=10 loops=2)
Filter: ((document_id)::bigint = '791785'::bigint)
Rows Removed by Filter: 5861568
Planning Time: 0.253 ms
Execution Time: 9291.610 ms
Index Scan using proofreading_issues_versions_document_id_index on proofreading_issues_versions p0 (cost=0.43..550.18 rows=162 width=616) (actual time=0.061..1.317 rows=20 loops=1)
Index Cond: (document_id = 791785)
Planning Time: 0.370 ms
Execution Time: 1.345 ms
Index Scan using proofreading_issues_versions_document_id_index on proofreading_issues_versions p0 (cost=0.43..550.18 rows=162 width=616) (actual time=0.061..1.317 rows=20 loops=1)
Index Cond: (document_id = 791785)
Planning Time: 0.370 ms
Execution Time: 1.345 ms
It looks like the index is ignored indeed when casting the argument. The document_id is of type integer in that table, is that perhaps the reason? We may have manually written the DB migrations for the paper_trail tables (from not knowing how to generate them back then). If it helps: - PostgreSQL 17.4
Failz
Failz4mo ago
The slow seq scan version is definitely missing the index. Seq scan means it's reading the entire table. Can you share the output of \d proofreading_issues_versions. Might also be worth running analyze on the proofreading_issues_versions table. The query planner estimated rows=25315 rows would be returned when it really returned rows=10. Analyze will update the statistics the postgres query planner uses and help it make better choices. edit: I didn't read the bottom paragraph of your post before responding. 🤦‍♂️ If the db column is integer and the query is casting to bigint I think that explains the index miss. I think there are certain situations where postgres will handle this gracefully, but I don't know or understand all the use cases for it. Initially I would have thought that because your query parameter of 791785 is within the integer range it would have been able to convert it, but obviously its missing the index. If the query parameter would have been something above the integer max value I would expect a index miss.
ZachDaniel
ZachDaniel4mo ago
Yes, that will be the problem 🙂 Let me look at something Okay. So right now we are casting all integers to bigints, and I think no one cared because the migration generator also creates all integer fields as bigints. I will figure out a way for you to express on the attribute that it is a regular integer to avoid the type casting, and then you should be good to go 👌
arconaut
arconautOP4mo ago
Thanks, Zach! Sounds good! As homework for our team - we should be more careful with integer types (maybe we need to implement some DB migrations)
ZachDaniel
ZachDaniel4mo ago
Something that could also be pretty useful is for Ash to have a tool you can run to find things like this Since we know all of the types of everything @arconaut I think you actually have the tools to do this right now
defmodule MyApp.Types.SmallInt do
use Ash.Type.NewType, subtype_of: :integer

def storage_type(_), do: :int4
end
defmodule MyApp.Types.SmallInt do
use Ash.Type.NewType, subtype_of: :integer

def storage_type(_), do: :int4
end
See how that does for you then use that in place of your integer type
arconaut
arconautOP4mo ago
Oh, interesting, thanks!

Did you find this page helpful?