Potential bug with relationship `parent` references
Bit of a gnarly one ahead... š
I'm running onto what I think is a bug with
parent
in relationship filters. My setup looks something like this:
1. A resource called GuideLevel
with a has_many relationship with no_attributes? true
and a filter. This relationship is called ancestors
and uses an AshPostgres.Ltree column to get the ancestors of the current record, although idt the ltree behavior is relevant to the bug. The filter
expression looks like: filter expr(fragment("? @> ? AND ? < ?", level, parent(level), nlevel, parent(nlevel)))
. Note that this references an nlevel
calculation which just calculates the nlevel
of the ltree column: calculate :nlevel, :integer, expr(fragment("nlevel(?)", level))
2. A few calculations on GuideLevel
that use a calculation module, which implements a sort of "inheritance" structure for a given setting on the GuideLevel
. It looks like this:
where source_field
is the source to pull the given setting from.
(continued in thread)35 Replies
In isolation (that is, when testing these all out against
GuideLevel
only, this works incredibly well. However, the problem comes up when we introduce the third resource:
3. A resource called Practice
that is related to GuideLevel
(guide levels have many practices). It just so happens that this resource has a column called level
which is a smallint (i.e. NOT an ltree). I suspect this naming overlap is the source of the bug. Practices have a calculation that pulls settings from their guide levels:
Where effective_range
is one of the aforementioned "inherited" calculations on the guide level resource
When I calculate range
on a practice record, I'm getting SQL errors, and looking at the generated SQL, it looks something like this:
See the issue labeled in the SQL... somehow Ash seems to be confusing the level
field on the practices table (which is a smallint
) for the level
field on the guide_levels
table (which is an ltree
). I end up getting errors from postgres: (Postgrex.Error) ERROR 42846 (cannot_coerce) cannot cast type smallint to ltree
.
I presume there's somewhere in the ash source that looks through the available refs for a name match, but might not handle when there's multiple name matches... or something of the sort š
if anyone's got any ideas where this could be (or if there's some user error I'm missing) I'd appreciate a second pair of eyesThere is no place that we use a name match to determine the type of something for queries
at least...there shouldn't be
I'd look for a place in
ash_sql
that uses Info.attribute
and could somehow be passing the wrong attribute š¤
but I don't think thats it
sg0."level"::ltree @> a0."level"::ltreeWhy is that a problem? Those should be ltrees to make sense for
@>
Yeah, so
a0.level
is NOT an ltree, which is the problem. It's a completely unrelated smallint
on the practices
tableš¤
But you're doing
sg0."level"::ltree @> a0."level"::ltree
so it would be the reference that is wrong probably
not the type casting
are you on latest ash_sql
?Yeah, the generated sql is wrong, is my point
yessir
š¢
i.e. the generated sql should be
sg0."level"::ltree @> g1."level"::ltree AND (nlevel(sg0."level"::ltree))::bigint < (nlevel(g1."level"::ltree))::bigint
Gotcha
Alright, will need a reproduction š
too many variables to really say
Yeah, thought so. I'll see if I can put one together, I'm definitely far down a combination of lots of slightly non-standard stuff š
It looks reasonable to me š
Just needs to be fixed ā¤ļø
well, at least fairly complex stuff, if not non-standard ha
Full Repro: https://github.com/ash-project/ash_postgres/pull/593
Can you open an issue as well? I will try to look in the next 2-3 days
It seems like not the easiest thing to figure out TBH š
So I will need to set aside a good chunk of time for it
Will do! I'm poking around a little bit in ash_sql myself, however my knowledge of the inner workings is limited so I'm definitely gonna timebox how much time I spend in there
Its unfortunately quite complex
but
what you can do
Would you prefer that issue in ash_postgres or ash_sql? I feel like the bug is more likely to be in ash_sql
Yeah, lets do
ash_sql
Is run claude code from the ash_sql
repository, make your reproduction use ash_sql
as a path dependency, and add a bash script in ash_sql
that cds into your project and runs testsClaude takes all the fun out of it š
It does
š
I wanna see the gnarly details
I would also prefer that š
I actually like to code lol
I bring it up sometimes as a way to get people to debug something they maybe don't feel confident to debug and maybe produce extra info
fair enough
but if you're down to roll your sleeves up on this one then that is 100% better
no promises I'll deliver a fix haha but I will try
some guidance:
- The bug will likely be somewhere with how parent references resolve
- the relevant files are
join.ex
and expr.ex
- bindings should be set to indicate the proper parent bindingsNice, that's helpful info, thanks!
I'm not to a solution yet, but I have sorta finger traced things into AshSql.Join.related_query. Somewhere a
parent_as
binding index is winding up being 0 instead of 1
(I think š
)I've had 2 issues with the same thing in the past few weeks š
good luck lol
https://discord.com/channels/711271361523351632/1259099945366061110/1390314125959102534 related convo
make the PR that replaces all integer bindings with
make_ref
š®Ooh interesting. Yeah I was seeing that
as: 500
popping up with the test case I'm using š
makes sense that that could create some weirdness
Alright after tinkering for a while, I'm thinking I'm unfamiliar enough with how this all works that I'm unlikely to get to a solution in the amount of time I have available.
What I can say is I think it's quite likely the issue is that somewhere in AshSql.Join.related_query
(or in the functions it calls), a binding of 0
(the root binding) is being chosen, when a binding of 1
should be chosen (this is for the test case I added in the PR above). The fact that the root binding (FolderItem
in the repro case) happens to have a level
field is the only reason we get as far as we do, but I think the issue is simply that the chosen binding is wrong.
I've traced in there and at the beginning of that function, the issue is not present, but the query returned from it has the incorrect binding. result of IO.inspect of the return value of that function:
the parent_as(0).level
is the relevant incorrect bindingIf you can write it up onto the issue I will look
probably tomorrow but maybe friday
will do
quick hack solution proposed before was add another 500 to the bindings count if you are in a
parent
situation š¤£Solution
@Jesse Williams thanks for the tests, I've got a fix
š§
that's amazing
Is it in ash_sql?
yeah
just pushed