Aggregate first() generates seemingly inefficient subquery

The following code

calculate :cover_image_path,
              :string,
              expr(
                cover_image.path ||
                  first(photos, field: :path, query: [sort: [inserted_at: :asc]])
              )

calculate :cover_image_path,
              :string,
              expr(
                cover_image.path ||
                  first(photos, field: :path, query: [sort: [inserted_at: :asc]])
              )

generates an query for the first() using an arr_aggr, selecting all photos and selecting the first value in the array. Is there a way to optimize this to not use a arr_agg and just a order by X limit 1 in the subquery? Example SQL query generated:

SELECT a0."id", a0."name", a0."description", a0."cover_image_id", a0."created_by_id", a0."end_date", a0."inserted_at", a0."start_date", a0."updated_at", coalesce(p1."path"::text, s2."first_photo_path"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", (array_agg(sp0."path"::text ORDER BY  sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL))[1]::text AS "first_photo_path" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = $1::uuid) ["01988ff7-e02b-7f68-a1bb-4c3e2e6b803d"]

SELECT a0."id", a0."name", a0."description", a0."cover_image_id", a0."created_by_id", a0."end_date", a0."inserted_at", a0."start_date", a0."updated_at", coalesce(p1."path"::text, s2."first_photo_path"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", (array_agg(sp0."path"::text ORDER BY  sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL))[1]::text AS "first_photo_path" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = $1::uuid) ["01988ff7-e02b-7f68-a1bb-4c3e2e6b803d"]

5 Replies

ZachDaniel•4mo ago

If you're on a new version of Postgres I think we use first_value but there may be a simple optimization available there Or if you set include_nil? false we might

PhoopOP•4mo ago

Running on postgres 17,

calculate :cover_image_path,
              :string,
              expr(
                cover_image.path ||
                  first(photos,
                    include_nil?: false,
                    field: :path,
                    query: [sort: [inserted_at: :asc]]
                  )
              )

calculate :cover_image_path,
              :string,
              expr(
                cover_image.path ||
                  first(photos,
                    include_nil?: false,
                    field: :path,
                    query: [sort: [inserted_at: :asc]]
                  )
              )

also generates

SELECT a0."id", a0."name", a0."description", a0."start_date", a0."end_date", a0."created_by_id", a0."inserted_at", a0."updated_at", a0."cover_image_id", coalesce(p1."path"::text, s2."aggregate_0"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", (array_agg(sp0."path"::text ORDER BY  sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL))[1]::text AS "aggregate_0" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = $1::uuid) ["01989014-0c6c-7820-92f0-b5ff42d9a45a"]

SELECT a0."id", a0."name", a0."description", a0."start_date", a0."end_date", a0."created_by_id", a0."inserted_at", a0."updated_at", a0."cover_image_id", coalesce(p1."path"::text, s2."aggregate_0"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", (array_agg(sp0."path"::text ORDER BY  sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL))[1]::text AS "aggregate_0" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = $1::uuid) ["01989014-0c6c-7820-92f0-b5ff42d9a45a"]

Want me to log an issue?

ZachDaniel•4mo ago

What does your min_postgres_version function in your repo say?

PhoopOP•4mo ago

Ah! Today I learned that is a thing 🙂 Bumped that to 17 and now got the following:

SELECT a0."id", a0."name", a0."description", a0."cover_image_id", a0."created_by_id", a0."end_date", a0."inserted_at", a0."start_date", a0."updated_at", coalesce(p1."path"::text, s2."aggregate_0"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", any_value(sp0."path"::text ORDER BY  sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL)::text AS "aggregate_0" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = '01989035-6ac1-70c0-a1e5-d9fd0beb192c'::uuid);

SELECT a0."id", a0."name", a0."description", a0."cover_image_id", a0."created_by_id", a0."end_date", a0."inserted_at", a0."start_date", a0."updated_at", coalesce(p1."path"::text, s2."aggregate_0"::text)::text FROM "albums" AS a0 LEFT OUTER JOIN "public"."photos" AS p1 ON a0."cover_image_id" = p1."id" LEFT OUTER JOIN LATERAL (SELECT sp0."album_id" AS "album_id", any_value(sp0."path"::text ORDER BY  sp0."inserted_at" ASC ) FILTER (WHERE sp0."path"::text IS NOT NULL)::text AS "aggregate_0" FROM "public"."photos" AS sp0 WHERE (a0."id" = sp0."album_id") GROUP BY sp0."album_id") AS s2 ON TRUE WHERE (a0."id"::uuid = '01989035-6ac1-70c0-a1e5-d9fd0beb192c'::uuid);

Dont know if optimal now or not, never seen the any_value function. Let me do some reading. Thanks so far! According to postgres this is still a bit suboptimal:

  Nested Loop Left Join  (cost=8.72..367.73 rows=51 width=164) (actual time=0.259..0.837 rows=51 loops=1)
   ->  Hash Right Join  (cost=2.15..5.53 rows=51 width=256) (actual time=0.131..0.153 rows=51 loops=1)
         Hash Cond: (p1.id = a0.cover_image_id)
         ->  Seq Scan on photos p1  (cost=0.00..3.00 rows=100 width=140) (actual time=0.007..0.024 rows=100 loops=1)
         ->  Hash  (cost=1.51..1.51 rows=51 width=132) (actual time=0.080..0.081 rows=51 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 14kB
               ->  Seq Scan on albums a0  (cost=0.00..1.51 rows=51 width=132) (actual time=0.027..0.035 rows=51 loops=1)
   ->  GroupAggregate  (cost=6.57..7.08 rows=1 width=48) (actual time=0.013..0.013 rows=0 loops=51)
         ->  Sort  (cost=6.57..6.82 rows=100 width=148) (actual time=0.012..0.012 rows=2 loops=51)
               Sort Key: sp0.inserted_at
               Sort Method: quicksort  Memory: 25kB
               ->  Seq Scan on photos sp0  (cost=0.00..3.25 rows=100 width=148) (actual time=0.009..0.010 rows=2 loops=51)
                     Filter: (a0.id = album_id)
                     Rows Removed by Filter: 98
 Planning Time: 0.928 ms
 Execution Time: 0.987 ms

  Nested Loop Left Join  (cost=8.72..367.73 rows=51 width=164) (actual time=0.259..0.837 rows=51 loops=1)
   ->  Hash Right Join  (cost=2.15..5.53 rows=51 width=256) (actual time=0.131..0.153 rows=51 loops=1)
         Hash Cond: (p1.id = a0.cover_image_id)
         ->  Seq Scan on photos p1  (cost=0.00..3.00 rows=100 width=140) (actual time=0.007..0.024 rows=100 loops=1)
         ->  Hash  (cost=1.51..1.51 rows=51 width=132) (actual time=0.080..0.081 rows=51 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 14kB
               ->  Seq Scan on albums a0  (cost=0.00..1.51 rows=51 width=132) (actual time=0.027..0.035 rows=51 loops=1)
   ->  GroupAggregate  (cost=6.57..7.08 rows=1 width=48) (actual time=0.013..0.013 rows=0 loops=51)
         ->  Sort  (cost=6.57..6.82 rows=100 width=148) (actual time=0.012..0.012 rows=2 loops=51)
               Sort Key: sp0.inserted_at
               Sort Method: quicksort  Memory: 25kB
               ->  Seq Scan on photos sp0  (cost=0.00..3.25 rows=100 width=148) (actual time=0.009..0.010 rows=2 loops=51)
                     Filter: (a0.id = album_id)
                     Rows Removed by Filter: 98
 Planning Time: 0.928 ms
 Execution Time: 0.987 ms

This is with the last WHERE clausule removed to have some more data to work with

ZachDaniel•4mo ago

Yeah, if there is just one first aggregate we should lift up the limit to the query

Gaming

Programming

Aggregate first() generates seemingly inefficient subquery

Did you find this page helpful?