Potential bug with relationship `parent` references
Bit of a gnarly one ahead... š
I'm running onto what I think is a bug with
parent in relationship filters. My setup looks something like this:
1. A resource called GuideLevel with a has_many relationship with no_attributes? true and a filter. This relationship is called ancestors and uses an AshPostgres.Ltree column to get the ancestors of the current record, although idt the ltree behavior is relevant to the bug. The filter expression looks like: filter expr(fragment("? @> ? AND ? < ?", level, parent(level), nlevel, parent(nlevel))). Note that this references an nlevel calculation which just calculates the nlevel of the ltree column: calculate :nlevel, :integer, expr(fragment("nlevel(?)", level))
2. A few calculations on GuideLevel that use a calculation module, which implements a sort of "inheritance" structure for a given setting on the GuideLevel. It looks like this:
where source_field is the source to pull the given setting from.
(continued in thread)35 Replies
In isolation (that is, when testing these all out against
GuideLevel only, this works incredibly well. However, the problem comes up when we introduce the third resource:
3. A resource called Practice that is related to GuideLevel (guide levels have many practices). It just so happens that this resource has a column called level which is a smallint (i.e. NOT an ltree). I suspect this naming overlap is the source of the bug. Practices have a calculation that pulls settings from their guide levels:
Where effective_range is one of the aforementioned "inherited" calculations on the guide level resource
When I calculate range on a practice record, I'm getting SQL errors, and looking at the generated SQL, it looks something like this:
See the issue labeled in the SQL... somehow Ash seems to be confusing the level field on the practices table (which is a smallint) for the level field on the guide_levels table (which is an ltree). I end up getting errors from postgres: (Postgrex.Error) ERROR 42846 (cannot_coerce) cannot cast type smallint to ltree.
I presume there's somewhere in the ash source that looks through the available refs for a name match, but might not handle when there's multiple name matches... or something of the sort š
if anyone's got any ideas where this could be (or if there's some user error I'm missing) I'd appreciate a second pair of eyesThere is no place that we use a name match to determine the type of something for queries
at least...there shouldn't be
I'd look for a place in
ash_sql that uses Info.attribute and could somehow be passing the wrong attribute š¤
but I don't think thats it
sg0."level"::ltree @> a0."level"::ltreeWhy is that a problem? Those should be ltrees to make sense for
@>Yeah, so
a0.level is NOT an ltree, which is the problem. It's a completely unrelated smallint on the practices tableš¤
But you're doing
sg0."level"::ltree @> a0."level"::ltree
so it would be the reference that is wrong probably
not the type casting
are you on latest ash_sql?Yeah, the generated sql is wrong, is my point
yessir
š¢
i.e. the generated sql should be
sg0."level"::ltree @> g1."level"::ltree AND (nlevel(sg0."level"::ltree))::bigint < (nlevel(g1."level"::ltree))::bigintGotcha
Alright, will need a reproduction š
too many variables to really say
Yeah, thought so. I'll see if I can put one together, I'm definitely far down a combination of lots of slightly non-standard stuff š
It looks reasonable to me š
Just needs to be fixed ā¤ļø
well, at least fairly complex stuff, if not non-standard ha
Full Repro: https://github.com/ash-project/ash_postgres/pull/593
Can you open an issue as well? I will try to look in the next 2-3 days
It seems like not the easiest thing to figure out TBH š
So I will need to set aside a good chunk of time for it
Will do! I'm poking around a little bit in ash_sql myself, however my knowledge of the inner workings is limited so I'm definitely gonna timebox how much time I spend in there
Its unfortunately quite complex
but
what you can do
Would you prefer that issue in ash_postgres or ash_sql? I feel like the bug is more likely to be in ash_sql
Yeah, lets do
ash_sql
Is run claude code from the ash_sql repository, make your reproduction use ash_sql as a path dependency, and add a bash script in ash_sql that cds into your project and runs testsClaude takes all the fun out of it š
It does
š
I wanna see the gnarly details
I would also prefer that š
I actually like to code lol
I bring it up sometimes as a way to get people to debug something they maybe don't feel confident to debug and maybe produce extra info
fair enough
but if you're down to roll your sleeves up on this one then that is 100% better
no promises I'll deliver a fix haha but I will try
some guidance:
- The bug will likely be somewhere with how parent references resolve
- the relevant files are
join.ex and expr.ex
- bindings should be set to indicate the proper parent bindingsNice, that's helpful info, thanks!
I'm not to a solution yet, but I have sorta finger traced things into AshSql.Join.related_query. Somewhere a
parent_as binding index is winding up being 0 instead of 1
(I think š
)I've had 2 issues with the same thing in the past few weeks š
good luck lol
https://discord.com/channels/711271361523351632/1259099945366061110/1390314125959102534 related convo
make the PR that replaces all integer bindings with
make_ref š®Ooh interesting. Yeah I was seeing that
as: 500 popping up with the test case I'm using š
makes sense that that could create some weirdness
Alright after tinkering for a while, I'm thinking I'm unfamiliar enough with how this all works that I'm unlikely to get to a solution in the amount of time I have available.
What I can say is I think it's quite likely the issue is that somewhere in AshSql.Join.related_query (or in the functions it calls), a binding of 0 (the root binding) is being chosen, when a binding of 1 should be chosen (this is for the test case I added in the PR above). The fact that the root binding (FolderItem in the repro case) happens to have a level field is the only reason we get as far as we do, but I think the issue is simply that the chosen binding is wrong.
I've traced in there and at the beginning of that function, the issue is not present, but the query returned from it has the incorrect binding. result of IO.inspect of the return value of that function:
the parent_as(0).level is the relevant incorrect bindingIf you can write it up onto the issue I will look
probably tomorrow but maybe friday
will do
quick hack solution proposed before was add another 500 to the bindings count if you are in a
parent situation š¤£Solution
@Jesse Williams thanks for the tests, I've got a fix
š§
that's amazing
Is it in ash_sql?
yeah
just pushed