Bad join query when using attribute tenancy
Hi! I'm noticing a major performance bug when using attribute-based multi-tenancy in our app:
We have an a resource
Visit
that has:
belongs_to :patient, Practices.Patient, public?: true
both tables have an attribute practice_id
which is the multitenancy attribute
Then I have a calculation that looks like this:
But loading this calculation causes a terrible query:
The unnecessary fields aren't terrible, but loading the entire patient table to get one causes this query to take 300ms instead of a few ms
Is there a way around this?
Ash version:
3.5.13
Ash postgres:
2.5.2223 Replies
🤔 are you sure that the unnecessary selected attributes has anything to do w/ the query performance?
it's not the selected attributes, it's the fact that it loads the entire patient table filter by tenant just to select one id
and that i confirmed via explain analyze
I should mention the
Patient
is a belongs_to
association on Visit
🤔 to be clear, you're saying that specifically removing those unnecessary fields from the select gets you a better query plan?
no sorry I think I made this confusing
There are two issues with this query:
1) It's loading unnecessary attributes which aren't required in the calculation - this is minor in this particular case, though in other cases could cause a problem
2) It's doing a subquery to load the entire table filtered by practice_id, and only then joining on the id. So it loads 1500 rows to select 1 of them, when it could filter to that row immediately within the subquery since the patient is a
belongs_to
association <-- This is what causes it to take 300ms instead of 3mshuh
There is no filter for the relationship attributes in that query?
this subquery
it's first filtering by practice_id (tenant), and then only after the subquery completes, it does the ON to filter by id
if it filtered by id within the subquery it'd be much faster
(i think)
You should confirm that 😄
ok let me try - it's hard to test this in prod where i have enough data but i can check it
Oh, I think I see what you mean
Its just not filtering on the multitenancy attribute at all
I get you
practice_id is a literal value that we have when building the query
let me double check before i send you down the wrong path 🙂 running explain analyze on the whole thing
No sorry I take it back - it does apply the primary key index before applying the other filter. I think the issue is elsewhere
Yeah, I'm not actually sure the kind of query you're suggesting is possible
the outer binding is not available inside the subquery
though is the overselecting of attributes still a bug?
Not a bug
i'm not sure if it's possible for ash to know which fields we're using
Its something that would be nice to address at some point
just to make queries more readable
If someone illustrated that it caused tangible perf issues aside from like the tiny bit of extra bandwidth for sending the query then we could prioritize it, but in all likelihood the analysis required to find out which fields are being used in filters/calculations while building the join would take more time than we'd save by not just sending the bigger query
the main point there is that its not actually selecting those attributes in the result set
ah makes sense
This is what causes it to take 300ms instead of 3msIt sounds like you have a "good" and a "bad" query in mind i.e the one that took 3ms
the good query was putting the condition on the primary key inside the subquery, but my test wasn't a good test because i was testing it in isolation not as part of the LEFT JOIN where the ON could take effect
Ah, interesting. Might be worth checking for missing indexes, like perhaps you need an index on
patient_id
Having a foreign key does not automatically create an index
You can tell Ash to generate one in the references
block
You don't always need indexes, and its not default for postgres to make them, so its opt-in on our end too
https://hexdocs.pm/ash_postgres/dsl-ashpostgres-datalayer.html#postgres-references-reference-index?
Yup I think I found the missing index. Not entirely missing, but two options and postgres is selecting the wrong one, so I need to create an optimal one. Thanks for the help and sorry to bother you about a non Ash issue!
No worries, this is a good help thread to have 😄
Just to conclude the findings here for anyone looking in the future:
- We added attribute multitenancy, which added an index on the practice_id
- The query planner switched to use that index instead of the existing index on visit_id that would have been much quicker (and was what it was using before)
- There was no index on both practice_id, visit_id which the query planner would have probably prioritized over both
- Added an index on [visit_id, practice_id] using all_tenants? true - that order is more efficient htan practice_id, visit_id which is why we did it with all_tenants
Execution time went from 2s -> 35ms 😄
🥳
Something I recommend also is using live dashboard's missing indexes tooling (which is just from pg stats IIRC)
and tools like pghero etc. (there is probably some new kid on the block doing stuff like that)
Its one of the "new difficulties" from using Ash is that often you're not thinking about exactly what your queries look like, but in my experience its honestly not a big deal, because you shouldn't add indexes unless reality proves that you need them etc.
And we have tools for that, and mechanisms to discover them