Potential bug with relationship `parent` references

Bit of a gnarly one ahead... šŸ˜… I'm running onto what I think is a bug with parent in relationship filters. My setup looks something like this: 1. A resource called GuideLevel with a has_many relationship with no_attributes? true and a filter. This relationship is called ancestors and uses an AshPostgres.Ltree column to get the ancestors of the current record, although idt the ltree behavior is relevant to the bug. The filter expression looks like: filter expr(fragment("? @> ? AND ? < ?", level, parent(level), nlevel, parent(nlevel))). Note that this references an nlevel calculation which just calculates the nlevel of the ltree column: calculate :nlevel, :integer, expr(fragment("nlevel(?)", level)) 2. A few calculations on GuideLevel that use a calculation module, which implements a sort of "inheritance" structure for a given setting on the GuideLevel. It looks like this:
def expression(opts) do
source_field = Keyword.fetch!(opts, :source_field)
default_value = Keyword.fetch!(opts, :default_value)
expr(
if is_nil(^ref(source_field)) do
# closest ancestor with a non-nil source_field
first(ancestors,
query: [
sort: [nlevel: :desc],
filter: expr(not is_nil(^ref(source_field)))
],
field: ^source_field
) || ^default_value
else
^ref(source_field)
end
)
end
def expression(opts) do
source_field = Keyword.fetch!(opts, :source_field)
default_value = Keyword.fetch!(opts, :default_value)
expr(
if is_nil(^ref(source_field)) do
# closest ancestor with a non-nil source_field
first(ancestors,
query: [
sort: [nlevel: :desc],
filter: expr(not is_nil(^ref(source_field)))
],
field: ^source_field
) || ^default_value
else
^ref(source_field)
end
)
end
where source_field is the source to pull the given setting from. (continued in thread)
Solution:
@Jesse Williams thanks for the tests, I've got a fix
Jump to solution
35 Replies
Jesse Williams
Jesse WilliamsOP•3mo ago
In isolation (that is, when testing these all out against GuideLevel only, this works incredibly well. However, the problem comes up when we introduce the third resource: 3. A resource called Practice that is related to GuideLevel (guide levels have many practices). It just so happens that this resource has a column called level which is a smallint (i.e. NOT an ltree). I suspect this naming overlap is the source of the bug. Practices have a calculation that pulls settings from their guide levels:
calculate :range, :integer, expr(guide_level.effective_range)
calculate :range, :integer, expr(guide_level.effective_range)
Where effective_range is one of the aforementioned "inherited" calculations on the guide level resource When I calculate range on a practice record, I'm getting SQL errors, and looking at the generated SQL, it looks something like this:
SELECT
a0."id",
-- <...various fields of practices>
(
-- <... CASE statement implementing the `if/else` from the guide level calculation>
)
FROM practices AS a0
LEFT OUTER JOIN "public"."guide_levels" AS g1 ON a0."guide_level_id" = "g1.id"
LEFT OUTER JOIN LATERAL (
SELECT
(
-- <array_agg getting setting for each ancestor>
)[1]::bigint AS "aggregate_1" -- <- get the first one
FROM "public"."guide_levels" AS sg0
WHERE (
sg0."level"::ltree @> a0."level"::ltree -- <---------- THIS IS THE ISSUE
AND (nlevel(sg0."level"::ltree))::bigint < (nlevel(a0."level"::ltree))::bigint -- <---- THIS TOO
)
) AS s2 ON TRUE
-- WHERE <... adaptive_practice id filter>
SELECT
a0."id",
-- <...various fields of practices>
(
-- <... CASE statement implementing the `if/else` from the guide level calculation>
)
FROM practices AS a0
LEFT OUTER JOIN "public"."guide_levels" AS g1 ON a0."guide_level_id" = "g1.id"
LEFT OUTER JOIN LATERAL (
SELECT
(
-- <array_agg getting setting for each ancestor>
)[1]::bigint AS "aggregate_1" -- <- get the first one
FROM "public"."guide_levels" AS sg0
WHERE (
sg0."level"::ltree @> a0."level"::ltree -- <---------- THIS IS THE ISSUE
AND (nlevel(sg0."level"::ltree))::bigint < (nlevel(a0."level"::ltree))::bigint -- <---- THIS TOO
)
) AS s2 ON TRUE
-- WHERE <... adaptive_practice id filter>
See the issue labeled in the SQL... somehow Ash seems to be confusing the level field on the practices table (which is a smallint) for the level field on the guide_levels table (which is an ltree). I end up getting errors from postgres: (Postgrex.Error) ERROR 42846 (cannot_coerce) cannot cast type smallint to ltree. I presume there's somewhere in the ash source that looks through the available refs for a name match, but might not handle when there's multiple name matches... or something of the sort šŸ˜… if anyone's got any ideas where this could be (or if there's some user error I'm missing) I'd appreciate a second pair of eyes
ZachDaniel
ZachDaniel•3mo ago
There is no place that we use a name match to determine the type of something for queries at least...there shouldn't be I'd look for a place in ash_sql that uses Info.attribute and could somehow be passing the wrong attribute šŸ¤” but I don't think thats it
sg0."level"::ltree @> a0."level"::ltree
Why is that a problem? Those should be ltrees to make sense for @>
Jesse Williams
Jesse WilliamsOP•3mo ago
Yeah, so a0.level is NOT an ltree, which is the problem. It's a completely unrelated smallint on the practices table
ZachDaniel
ZachDaniel•3mo ago
šŸ¤” But you're doing sg0."level"::ltree @> a0."level"::ltree so it would be the reference that is wrong probably not the type casting are you on latest ash_sql?
Jesse Williams
Jesse WilliamsOP•3mo ago
Yeah, the generated sql is wrong, is my point yessir
ZachDaniel
ZachDaniel•3mo ago
😢
Jesse Williams
Jesse WilliamsOP•3mo ago
i.e. the generated sql should be sg0."level"::ltree @> g1."level"::ltree AND (nlevel(sg0."level"::ltree))::bigint < (nlevel(g1."level"::ltree))::bigint
ZachDaniel
ZachDaniel•3mo ago
Gotcha Alright, will need a reproduction šŸ˜„ too many variables to really say
Jesse Williams
Jesse WilliamsOP•3mo ago
Yeah, thought so. I'll see if I can put one together, I'm definitely far down a combination of lots of slightly non-standard stuff šŸ˜…
ZachDaniel
ZachDaniel•3mo ago
It looks reasonable to me šŸ˜„ Just needs to be fixed ā¤ļø
Jesse Williams
Jesse WilliamsOP•3mo ago
well, at least fairly complex stuff, if not non-standard ha Full Repro: https://github.com/ash-project/ash_postgres/pull/593
ZachDaniel
ZachDaniel•3mo ago
Can you open an issue as well? I will try to look in the next 2-3 days It seems like not the easiest thing to figure out TBH šŸ™‚ So I will need to set aside a good chunk of time for it
Jesse Williams
Jesse WilliamsOP•3mo ago
Will do! I'm poking around a little bit in ash_sql myself, however my knowledge of the inner workings is limited so I'm definitely gonna timebox how much time I spend in there
ZachDaniel
ZachDaniel•3mo ago
Its unfortunately quite complex but what you can do
Jesse Williams
Jesse WilliamsOP•3mo ago
Would you prefer that issue in ash_postgres or ash_sql? I feel like the bug is more likely to be in ash_sql
ZachDaniel
ZachDaniel•3mo ago
Yeah, lets do ash_sql Is run claude code from the ash_sql repository, make your reproduction use ash_sql as a path dependency, and add a bash script in ash_sql that cds into your project and runs tests
Jesse Williams
Jesse WilliamsOP•3mo ago
Claude takes all the fun out of it šŸ˜‚
ZachDaniel
ZachDaniel•3mo ago
It does šŸ˜†
Jesse Williams
Jesse WilliamsOP•3mo ago
I wanna see the gnarly details
ZachDaniel
ZachDaniel•3mo ago
I would also prefer that šŸ˜†
Jesse Williams
Jesse WilliamsOP•3mo ago
I actually like to code lol
ZachDaniel
ZachDaniel•3mo ago
I bring it up sometimes as a way to get people to debug something they maybe don't feel confident to debug and maybe produce extra info
Jesse Williams
Jesse WilliamsOP•3mo ago
fair enough
ZachDaniel
ZachDaniel•3mo ago
but if you're down to roll your sleeves up on this one then that is 100% better
Jesse Williams
Jesse WilliamsOP•3mo ago
no promises I'll deliver a fix haha but I will try
ZachDaniel
ZachDaniel•3mo ago
some guidance: - The bug will likely be somewhere with how parent references resolve - the relevant files are join.ex and expr.ex - bindings should be set to indicate the proper parent bindings
Jesse Williams
Jesse WilliamsOP•3mo ago
Nice, that's helpful info, thanks! I'm not to a solution yet, but I have sorta finger traced things into AshSql.Join.related_query. Somewhere a parent_as binding index is winding up being 0 instead of 1 (I think šŸ˜… )
kernel
kernel•3mo ago
I've had 2 issues with the same thing in the past few weeks šŸ™‚ good luck lol https://discord.com/channels/711271361523351632/1259099945366061110/1390314125959102534 related convo make the PR that replaces all integer bindings with make_ref 😮
Jesse Williams
Jesse WilliamsOP•3mo ago
Ooh interesting. Yeah I was seeing that as: 500 popping up with the test case I'm using šŸ˜… makes sense that that could create some weirdness Alright after tinkering for a while, I'm thinking I'm unfamiliar enough with how this all works that I'm unlikely to get to a solution in the amount of time I have available. What I can say is I think it's quite likely the issue is that somewhere in AshSql.Join.related_query (or in the functions it calls), a binding of 0 (the root binding) is being chosen, when a binding of 1 should be chosen (this is for the test case I added in the PR above). The fact that the root binding (FolderItem in the repro case) happens to have a level field is the only reason we get as far as we do, but I think the issue is simply that the chosen binding is wrong. I've traced in there and at the beginning of that function, the issue is not present, but the query returned from it has the incorrect binding. result of IO.inspect of the return value of that function:
{:ok,
#Ecto.Query<from f0 in AshPostgres.Test.Support.ComplexCalculations.Folder,
as: 2,
where: fragment(
"(? @> ? AND ? < ?)",
type(as(2).level, {:parameterized, {AshPostgres.Ltree.EctoType, []}}),
type(parent_as(0).level, {:parameterized, {AshPostgres.Ltree.EctoType, []}}),
type(
fragment(
"(nlevel(?))",
type(as(2).level, {:parameterized, {AshPostgres.Ltree.EctoType, []}})
),
{:parameterized, {Ash.Type.Integer.EctoType, []}}
),
type(
fragment(
"(nlevel(?))",
type(parent_as(0).level, {:parameterized, {Ash.Type.Integer.EctoType, []}})
),
{:parameterized, {Ash.Type.Integer.EctoType, []}}
)
)>}
{:ok,
#Ecto.Query<from f0 in AshPostgres.Test.Support.ComplexCalculations.Folder,
as: 2,
where: fragment(
"(? @> ? AND ? < ?)",
type(as(2).level, {:parameterized, {AshPostgres.Ltree.EctoType, []}}),
type(parent_as(0).level, {:parameterized, {AshPostgres.Ltree.EctoType, []}}),
type(
fragment(
"(nlevel(?))",
type(as(2).level, {:parameterized, {AshPostgres.Ltree.EctoType, []}})
),
{:parameterized, {Ash.Type.Integer.EctoType, []}}
),
type(
fragment(
"(nlevel(?))",
type(parent_as(0).level, {:parameterized, {Ash.Type.Integer.EctoType, []}})
),
{:parameterized, {Ash.Type.Integer.EctoType, []}}
)
)>}
the parent_as(0).level is the relevant incorrect binding
ZachDaniel
ZachDaniel•3mo ago
If you can write it up onto the issue I will look probably tomorrow but maybe friday
Jesse Williams
Jesse WilliamsOP•3mo ago
will do
kernel
kernel•3mo ago
quick hack solution proposed before was add another 500 to the bindings count if you are in a parent situation 🤣
Solution
ZachDaniel
ZachDaniel•3mo ago
@Jesse Williams thanks for the tests, I've got a fix
Jesse Williams
Jesse WilliamsOP•3mo ago
šŸ§™ that's amazing Is it in ash_sql?
ZachDaniel
ZachDaniel•3mo ago
yeah just pushed

Did you find this page helpful?