Aggregate first() generates seemingly inefficient subquery
The following code
generates an query for the first() using an arr_aggr, selecting all photos and selecting the first value in the array. Is there a way to optimize this to not use a arr_agg and just a order by X limit 1 in the subquery?
Example SQL query generated:
calculate :cover_image_path,
:string,
expr(
cover_image.path ||
first(photos, field: :path, query: [sort: [inserted_at: :asc]])
)
calculate :cover_image_path,
:string,
expr(
cover_image.path ||
first(photos, field: :path, query: [sort: [inserted_at: :asc]])
)
SELECT a0."id", a0."name", a0."description", a0."cover_image_id", a0."created_by_id", a0."end_date", a0."inserted_at", a0."start_date", a0."updated_at", coalesce(p1."path"::text, s2."first_photo_path"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", (array_agg(sp0."path"::text ORDER BY sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL))[1]::text AS "first_photo_path" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = $1::uuid) ["01988ff7-e02b-7f68-a1bb-4c3e2e6b803d"]
SELECT a0."id", a0."name", a0."description", a0."cover_image_id", a0."created_by_id", a0."end_date", a0."inserted_at", a0."start_date", a0."updated_at", coalesce(p1."path"::text, s2."first_photo_path"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", (array_agg(sp0."path"::text ORDER BY sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL))[1]::text AS "first_photo_path" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = $1::uuid) ["01988ff7-e02b-7f68-a1bb-4c3e2e6b803d"]
5 Replies
If you're on a new version of Postgres I think we use
first_value
but there may be a simple optimization available there
Or if you set include_nil? false
we mightRunning on postgres 17, also generates
Want me to log an issue?
calculate :cover_image_path,
:string,
expr(
cover_image.path ||
first(photos,
include_nil?: false,
field: :path,
query: [sort: [inserted_at: :asc]]
)
)
calculate :cover_image_path,
:string,
expr(
cover_image.path ||
first(photos,
include_nil?: false,
field: :path,
query: [sort: [inserted_at: :asc]]
)
)
SELECT a0."id", a0."name", a0."description", a0."start_date", a0."end_date", a0."created_by_id", a0."inserted_at", a0."updated_at", a0."cover_image_id", coalesce(p1."path"::text, s2."aggregate_0"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", (array_agg(sp0."path"::text ORDER BY sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL))[1]::text AS "aggregate_0" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = $1::uuid) ["01989014-0c6c-7820-92f0-b5ff42d9a45a"]
SELECT a0."id", a0."name", a0."description", a0."start_date", a0."end_date", a0."created_by_id", a0."inserted_at", a0."updated_at", a0."cover_image_id", coalesce(p1."path"::text, s2."aggregate_0"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", (array_agg(sp0."path"::text ORDER BY sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL))[1]::text AS "aggregate_0" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = $1::uuid) ["01989014-0c6c-7820-92f0-b5ff42d9a45a"]
What does your
min_postgres_version
function in your repo say?Ah! Today I learned that is a thing 🙂 Bumped that to 17 and now got the following:
Dont know if optimal now or not, never seen the any_value function. Let me do some reading. Thanks so far!
According to postgres this is still a bit suboptimal:
This is with the last WHERE clausule removed to have some more data to work with
SELECT a0."id", a0."name", a0."description", a0."cover_image_id", a0."created_by_id", a0."end_date", a0."inserted_at", a0."start_date", a0."updated_at", coalesce(p1."path"::text, s2."aggregate_0"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", any_value(sp0."path"::text ORDER BY sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL)::text AS "aggregate_0" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = '01989035-6ac1-70c0-a1e5-d9fd0beb192c'::uuid);
SELECT a0."id", a0."name", a0."description", a0."cover_image_id", a0."created_by_id", a0."end_date", a0."inserted_at", a0."start_date", a0."updated_at", coalesce(p1."path"::text, s2."aggregate_0"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", any_value(sp0."path"::text ORDER BY sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL)::text AS "aggregate_0" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = '01989035-6ac1-70c0-a1e5-d9fd0beb192c'::uuid);
Nested Loop Left Join (cost=8.72..367.73 rows=51 width=164) (actual time=0.259..0.837 rows=51 loops=1)
-> Hash Right Join (cost=2.15..5.53 rows=51 width=256) (actual time=0.131..0.153 rows=51 loops=1)
Hash Cond: (p1.id = a0.cover_image_id)
-> Seq Scan on photos p1 (cost=0.00..3.00 rows=100 width=140) (actual time=0.007..0.024 rows=100 loops=1)
-> Hash (cost=1.51..1.51 rows=51 width=132) (actual time=0.080..0.081 rows=51 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Seq Scan on albums a0 (cost=0.00..1.51 rows=51 width=132) (actual time=0.027..0.035 rows=51 loops=1)
-> GroupAggregate (cost=6.57..7.08 rows=1 width=48) (actual time=0.013..0.013 rows=0 loops=51)
-> Sort (cost=6.57..6.82 rows=100 width=148) (actual time=0.012..0.012 rows=2 loops=51)
Sort Key: sp0.inserted_at
Sort Method: quicksort Memory: 25kB
-> Seq Scan on photos sp0 (cost=0.00..3.25 rows=100 width=148) (actual time=0.009..0.010 rows=2 loops=51)
Filter: (a0.id = album_id)
Rows Removed by Filter: 98
Planning Time: 0.928 ms
Execution Time: 0.987 ms
Nested Loop Left Join (cost=8.72..367.73 rows=51 width=164) (actual time=0.259..0.837 rows=51 loops=1)
-> Hash Right Join (cost=2.15..5.53 rows=51 width=256) (actual time=0.131..0.153 rows=51 loops=1)
Hash Cond: (p1.id = a0.cover_image_id)
-> Seq Scan on photos p1 (cost=0.00..3.00 rows=100 width=140) (actual time=0.007..0.024 rows=100 loops=1)
-> Hash (cost=1.51..1.51 rows=51 width=132) (actual time=0.080..0.081 rows=51 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Seq Scan on albums a0 (cost=0.00..1.51 rows=51 width=132) (actual time=0.027..0.035 rows=51 loops=1)
-> GroupAggregate (cost=6.57..7.08 rows=1 width=48) (actual time=0.013..0.013 rows=0 loops=51)
-> Sort (cost=6.57..6.82 rows=100 width=148) (actual time=0.012..0.012 rows=2 loops=51)
Sort Key: sp0.inserted_at
Sort Method: quicksort Memory: 25kB
-> Seq Scan on photos sp0 (cost=0.00..3.25 rows=100 width=148) (actual time=0.009..0.010 rows=2 loops=51)
Filter: (a0.id = album_id)
Rows Removed by Filter: 98
Planning Time: 0.928 ms
Execution Time: 0.987 ms
Yeah, if there is just one
first
aggregate we should lift up the limit
to the query