AF
Ash Framework•2mo ago
Phoop

Aggregate first() generates seemingly inefficient subquery

The following code
calculate :cover_image_path,
:string,
expr(
cover_image.path ||
first(photos, field: :path, query: [sort: [inserted_at: :asc]])
)
calculate :cover_image_path,
:string,
expr(
cover_image.path ||
first(photos, field: :path, query: [sort: [inserted_at: :asc]])
)
generates an query for the first() using an arr_aggr, selecting all photos and selecting the first value in the array. Is there a way to optimize this to not use a arr_agg and just a order by X limit 1 in the subquery? Example SQL query generated:
SELECT a0."id", a0."name", a0."description", a0."cover_image_id", a0."created_by_id", a0."end_date", a0."inserted_at", a0."start_date", a0."updated_at", coalesce(p1."path"::text, s2."first_photo_path"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", (array_agg(sp0."path"::text ORDER BY sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL))[1]::text AS "first_photo_path" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = $1::uuid) ["01988ff7-e02b-7f68-a1bb-4c3e2e6b803d"]
SELECT a0."id", a0."name", a0."description", a0."cover_image_id", a0."created_by_id", a0."end_date", a0."inserted_at", a0."start_date", a0."updated_at", coalesce(p1."path"::text, s2."first_photo_path"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", (array_agg(sp0."path"::text ORDER BY sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL))[1]::text AS "first_photo_path" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = $1::uuid) ["01988ff7-e02b-7f68-a1bb-4c3e2e6b803d"]
5 Replies
ZachDaniel
ZachDaniel•2mo ago
If you're on a new version of Postgres I think we use first_value but there may be a simple optimization available there Or if you set include_nil? false we might
Phoop
PhoopOP•2mo ago
Running on postgres 17,
calculate :cover_image_path,
:string,
expr(
cover_image.path ||
first(photos,
include_nil?: false,
field: :path,
query: [sort: [inserted_at: :asc]]
)
)
calculate :cover_image_path,
:string,
expr(
cover_image.path ||
first(photos,
include_nil?: false,
field: :path,
query: [sort: [inserted_at: :asc]]
)
)
also generates
SELECT a0."id", a0."name", a0."description", a0."start_date", a0."end_date", a0."created_by_id", a0."inserted_at", a0."updated_at", a0."cover_image_id", coalesce(p1."path"::text, s2."aggregate_0"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", (array_agg(sp0."path"::text ORDER BY sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL))[1]::text AS "aggregate_0" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = $1::uuid) ["01989014-0c6c-7820-92f0-b5ff42d9a45a"]
SELECT a0."id", a0."name", a0."description", a0."start_date", a0."end_date", a0."created_by_id", a0."inserted_at", a0."updated_at", a0."cover_image_id", coalesce(p1."path"::text, s2."aggregate_0"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", (array_agg(sp0."path"::text ORDER BY sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL))[1]::text AS "aggregate_0" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = $1::uuid) ["01989014-0c6c-7820-92f0-b5ff42d9a45a"]
Want me to log an issue?
ZachDaniel
ZachDaniel•2mo ago
What does your min_postgres_version function in your repo say?
Phoop
PhoopOP•2mo ago
Ah! Today I learned that is a thing 🙂 Bumped that to 17 and now got the following:
SELECT a0."id", a0."name", a0."description", a0."cover_image_id", a0."created_by_id", a0."end_date", a0."inserted_at", a0."start_date", a0."updated_at", coalesce(p1."path"::text, s2."aggregate_0"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", any_value(sp0."path"::text ORDER BY sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL)::text AS "aggregate_0" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = '01989035-6ac1-70c0-a1e5-d9fd0beb192c'::uuid);
SELECT a0."id", a0."name", a0."description", a0."cover_image_id", a0."created_by_id", a0."end_date", a0."inserted_at", a0."start_date", a0."updated_at", coalesce(p1."path"::text, s2."aggregate_0"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", any_value(sp0."path"::text ORDER BY sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL)::text AS "aggregate_0" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = '01989035-6ac1-70c0-a1e5-d9fd0beb192c'::uuid);
Dont know if optimal now or not, never seen the any_value function. Let me do some reading. Thanks so far! According to postgres this is still a bit suboptimal:
Nested Loop Left Join (cost=8.72..367.73 rows=51 width=164) (actual time=0.259..0.837 rows=51 loops=1)
-> Hash Right Join (cost=2.15..5.53 rows=51 width=256) (actual time=0.131..0.153 rows=51 loops=1)
Hash Cond: (p1.id = a0.cover_image_id)
-> Seq Scan on photos p1 (cost=0.00..3.00 rows=100 width=140) (actual time=0.007..0.024 rows=100 loops=1)
-> Hash (cost=1.51..1.51 rows=51 width=132) (actual time=0.080..0.081 rows=51 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Seq Scan on albums a0 (cost=0.00..1.51 rows=51 width=132) (actual time=0.027..0.035 rows=51 loops=1)
-> GroupAggregate (cost=6.57..7.08 rows=1 width=48) (actual time=0.013..0.013 rows=0 loops=51)
-> Sort (cost=6.57..6.82 rows=100 width=148) (actual time=0.012..0.012 rows=2 loops=51)
Sort Key: sp0.inserted_at
Sort Method: quicksort Memory: 25kB
-> Seq Scan on photos sp0 (cost=0.00..3.25 rows=100 width=148) (actual time=0.009..0.010 rows=2 loops=51)
Filter: (a0.id = album_id)
Rows Removed by Filter: 98
Planning Time: 0.928 ms
Execution Time: 0.987 ms
Nested Loop Left Join (cost=8.72..367.73 rows=51 width=164) (actual time=0.259..0.837 rows=51 loops=1)
-> Hash Right Join (cost=2.15..5.53 rows=51 width=256) (actual time=0.131..0.153 rows=51 loops=1)
Hash Cond: (p1.id = a0.cover_image_id)
-> Seq Scan on photos p1 (cost=0.00..3.00 rows=100 width=140) (actual time=0.007..0.024 rows=100 loops=1)
-> Hash (cost=1.51..1.51 rows=51 width=132) (actual time=0.080..0.081 rows=51 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Seq Scan on albums a0 (cost=0.00..1.51 rows=51 width=132) (actual time=0.027..0.035 rows=51 loops=1)
-> GroupAggregate (cost=6.57..7.08 rows=1 width=48) (actual time=0.013..0.013 rows=0 loops=51)
-> Sort (cost=6.57..6.82 rows=100 width=148) (actual time=0.012..0.012 rows=2 loops=51)
Sort Key: sp0.inserted_at
Sort Method: quicksort Memory: 25kB
-> Seq Scan on photos sp0 (cost=0.00..3.25 rows=100 width=148) (actual time=0.009..0.010 rows=2 loops=51)
Filter: (a0.id = album_id)
Rows Removed by Filter: 98
Planning Time: 0.928 ms
Execution Time: 0.987 ms
This is with the last WHERE clausule removed to have some more data to work with
ZachDaniel
ZachDaniel•2mo ago
Yeah, if there is just one first aggregate we should lift up the limit to the query

Did you find this page helpful?