Bad join query when using attribute tenancy

Hi! I'm noticing a major performance bug when using attribute-based multi-tenancy in our app: We have an a resource Visit that has: belongs_to :patient, Practices.Patient, public?: true both tables have an attribute practice_id which is the multitenancy attribute Then I have a calculation that looks like this:
calculate :patient_display_name, :string do
public? true

calculation expr(
cond do
not is_nil(patient.last_name) ->
patient.first_name <> " " <> patient.last_name
true ->
nil
end
)
end
calculate :patient_display_name, :string do
public? true

calculation expr(
cond do
not is_nil(patient.last_name) ->
patient.first_name <> " " <> patient.last_name
true ->
nil
end
)
end
But loading this calculation causes a terrible query:
SELECT
v0.id,
-- patient_display_name calculation
CASE
WHEN s1.last_name IS NOT NULL THEN
s1.first_name || ' ' || s1.last_name
ELSE NULL
END AS patient_display_name

FROM visits v0

-- Patient subquery (loads ALL columns unnecessarily)
LEFT JOIN (
SELECT
sp0.id,
sp0.first_name, -- ✓ needed
sp0.last_name, -- ✓ needed
sp0.custom_id, -- ❌ unnecessary
sp0.hidden_at, -- ❌ unnecessary
... [ a bunch of other unnecessary fields ] ...
sp0.practice_id, -- ❌ unnecessary (already filtering on it)

FROM patients sp0
WHERE sp0.practice_id = $practice_id
) s1 ON v0.patient_id = s1.id

WHERE
v0.id = $visit_id
AND v0.practice_id = $practice_id;
SELECT
v0.id,
-- patient_display_name calculation
CASE
WHEN s1.last_name IS NOT NULL THEN
s1.first_name || ' ' || s1.last_name
ELSE NULL
END AS patient_display_name

FROM visits v0

-- Patient subquery (loads ALL columns unnecessarily)
LEFT JOIN (
SELECT
sp0.id,
sp0.first_name, -- ✓ needed
sp0.last_name, -- ✓ needed
sp0.custom_id, -- ❌ unnecessary
sp0.hidden_at, -- ❌ unnecessary
... [ a bunch of other unnecessary fields ] ...
sp0.practice_id, -- ❌ unnecessary (already filtering on it)

FROM patients sp0
WHERE sp0.practice_id = $practice_id
) s1 ON v0.patient_id = s1.id

WHERE
v0.id = $visit_id
AND v0.practice_id = $practice_id;
The unnecessary fields aren't terrible, but loading the entire patient table to get one causes this query to take 300ms instead of a few ms Is there a way around this? Ash version: 3.5.13 Ash postgres: 2.5.22
23 Replies
ZachDaniel
ZachDaniel3d ago
🤔 are you sure that the unnecessary selected attributes has anything to do w/ the query performance?
rohan
rohanOP3d ago
it's not the selected attributes, it's the fact that it loads the entire patient table filter by tenant just to select one id and that i confirmed via explain analyze I should mention the Patient is a belongs_to association on Visit
ZachDaniel
ZachDaniel3d ago
🤔 to be clear, you're saying that specifically removing those unnecessary fields from the select gets you a better query plan?
rohan
rohanOP3d ago
no sorry I think I made this confusing There are two issues with this query: 1) It's loading unnecessary attributes which aren't required in the calculation - this is minor in this particular case, though in other cases could cause a problem 2) It's doing a subquery to load the entire table filtered by practice_id, and only then joining on the id. So it loads 1500 rows to select 1 of them, when it could filter to that row immediately within the subquery since the patient is a belongs_to association <-- This is what causes it to take 300ms instead of 3ms
ZachDaniel
ZachDaniel3d ago
huh There is no filter for the relationship attributes in that query?
rohan
rohanOP3d ago
this subquery
-- Patient subquery
LEFT JOIN (
SELECT
sp0.id,
sp0.first_name, -- ✓ needed
sp0.last_name, -- ✓ needed
sp0.custom_id, -- ❌ unnecessary
sp0.hidden_at, -- ❌ unnecessary
... [ a bunch of other unnecessary fields ] ...
sp0.practice_id, -- ❌ unnecessary (already filtering on it)

FROM patients sp0
WHERE sp0.practice_id = $practice_id
) s1 ON v0.patient_id = s1.id
-- Patient subquery
LEFT JOIN (
SELECT
sp0.id,
sp0.first_name, -- ✓ needed
sp0.last_name, -- ✓ needed
sp0.custom_id, -- ❌ unnecessary
sp0.hidden_at, -- ❌ unnecessary
... [ a bunch of other unnecessary fields ] ...
sp0.practice_id, -- ❌ unnecessary (already filtering on it)

FROM patients sp0
WHERE sp0.practice_id = $practice_id
) s1 ON v0.patient_id = s1.id
it's first filtering by practice_id (tenant), and then only after the subquery completes, it does the ON to filter by id if it filtered by id within the subquery it'd be much faster (i think)
ZachDaniel
ZachDaniel3d ago
You should confirm that 😄
rohan
rohanOP3d ago
ok let me try - it's hard to test this in prod where i have enough data but i can check it
ZachDaniel
ZachDaniel3d ago
Oh, I think I see what you mean Its just not filtering on the multitenancy attribute at all I get you practice_id is a literal value that we have when building the query
rohan
rohanOP3d ago
let me double check before i send you down the wrong path 🙂 running explain analyze on the whole thing No sorry I take it back - it does apply the primary key index before applying the other filter. I think the issue is elsewhere
ZachDaniel
ZachDaniel3d ago
Yeah, I'm not actually sure the kind of query you're suggesting is possible the outer binding is not available inside the subquery
rohan
rohanOP3d ago
though is the overselecting of attributes still a bug?
ZachDaniel
ZachDaniel3d ago
Not a bug
rohan
rohanOP3d ago
i'm not sure if it's possible for ash to know which fields we're using
ZachDaniel
ZachDaniel3d ago
Its something that would be nice to address at some point just to make queries more readable If someone illustrated that it caused tangible perf issues aside from like the tiny bit of extra bandwidth for sending the query then we could prioritize it, but in all likelihood the analysis required to find out which fields are being used in filters/calculations while building the join would take more time than we'd save by not just sending the bigger query the main point there is that its not actually selecting those attributes in the result set
rohan
rohanOP3d ago
ah makes sense
ZachDaniel
ZachDaniel3d ago
This is what causes it to take 300ms instead of 3ms
It sounds like you have a "good" and a "bad" query in mind i.e the one that took 3ms
rohan
rohanOP3d ago
the good query was putting the condition on the primary key inside the subquery, but my test wasn't a good test because i was testing it in isolation not as part of the LEFT JOIN where the ON could take effect
ZachDaniel
ZachDaniel3d ago
Ah, interesting. Might be worth checking for missing indexes, like perhaps you need an index on patient_id Having a foreign key does not automatically create an index You can tell Ash to generate one in the references block You don't always need indexes, and its not default for postgres to make them, so its opt-in on our end too https://hexdocs.pm/ash_postgres/dsl-ashpostgres-datalayer.html#postgres-references-reference-index?
references do
reference :patient, index?: true
end
references do
reference :patient, index?: true
end
rohan
rohanOP3d ago
Yup I think I found the missing index. Not entirely missing, but two options and postgres is selecting the wrong one, so I need to create an optimal one. Thanks for the help and sorry to bother you about a non Ash issue!
ZachDaniel
ZachDaniel3d ago
No worries, this is a good help thread to have 😄
rohan
rohanOP3d ago
Just to conclude the findings here for anyone looking in the future: - We added attribute multitenancy, which added an index on the practice_id - The query planner switched to use that index instead of the existing index on visit_id that would have been much quicker (and was what it was using before) - There was no index on both practice_id, visit_id which the query planner would have probably prioritized over both - Added an index on [visit_id, practice_id] using all_tenants? true - that order is more efficient htan practice_id, visit_id which is why we did it with all_tenants Execution time went from 2s -> 35ms 😄
ZachDaniel
ZachDaniel3d ago
🥳 Something I recommend also is using live dashboard's missing indexes tooling (which is just from pg stats IIRC) and tools like pghero etc. (there is probably some new kid on the block doing stuff like that) Its one of the "new difficulties" from using Ash is that often you're not thinking about exactly what your queries look like, but in my experience its honestly not a big deal, because you shouldn't add indexes unless reality proves that you need them etc. And we have tools for that, and mechanisms to discover them

Did you find this page helpful?