Using Postgres function in an insert with a field as a select

Ttheogravity5/25/2023
I'm trying to translate the following SQL:

INSERT INTO sb.text_document_chunk (text_document_id, text)
SELECT NEW.id, sentence
FROM unnest(string_to_array(NEW.text_content, '. ')) AS sentence
WHERE sentence ~ '[[:alnum:]]';

to:
db.insertInto('text_document_chunk').values((eb) => ({
    text_document_id: document.id,
    text: db
      .selectFrom(eb.fn('unnest', [eb.fn('string_to_array', [document.text_content, '. '])]).as('sentence'))
      .where('sentence', '~', '[[:alnum:]]'),
  })).execute()

It doesn't seem to like [document.text_content, '. '] and where('sentence'). It seems the array value has to be a ReferenceExpression type, but the documentation isn't very clear on what that is.
Ttheogravity5/25/2023
Would this be the correct version? One concern I have is does this properly handle issues around SQL injection?
db.insertInto('text_document_chunk').values((eb) => ({
    text_document_id: document.id,
    text: db
      .selectFrom(
        eb.fn('unnest', [eb.fn('string_to_array', [sql.lit(document.text_content), sql.lit('. ')])]).as('sentence'),
      )
      .where(sql.lit('sentence'), '~', '[[:alnum:]]').execute()
IIgal5/25/2023
Hey 👋

Very close! You can use .expression instead:

import { sql } from "kysely"

await kysely
  .insertInto("text_document_chunk")
  .columns(["text_document_id", "text"])
  .expression((eb) =>
    eb
      .selectFrom(
        eb
          .fn<{ sentence: string }>("unnest", [
            eb.fn("string_to_array", [sql`NEW.text_content`, sql.lit(". ")]),
          ])
          .as("sentence"),
      )
      .where("sentence", "~", "[[:alnum:]]")
      .select([sql<string>`NEW.id`.as("text_document_id"), "sentence"]),
  )
  .execute()

https://kyse.link/?p=s&i=bYFbOw1HNX4D81GQJOHI
Ttheogravity5/25/2023
Thanks! In my translation, I'm not using NEW (trying to translate out of triggers / pg functions), and have document instead as the object with the values. So would it be sql.lit(document.text_content)?
IIgal5/25/2023
sql.lit is used to add literal values to the query (instead of passing values as parameters). passing a string to it will result in 'document.text_content' which not what you're looking for. You need to refer to "document.text_content"
IIgal5/25/2023
so sql.ref
Ttheogravity5/25/2023
ah
IIgal5/25/2023
or eb.ref if the referenced column is in query context. eb.ref is type-safe.
Ttheogravity5/25/2023
this is the contents of document.text_content
Ttheogravity5/25/2023
i think i understand the mismatch, in the original fn, NEW.text_content is an actual column, whereas here I'm feeding in the content, which I don't think is allowed?
IIgal5/25/2023
Will continue tomorrow, good night 💤
Ttheogravity5/25/2023
thank you for your time Igal
Ttheogravity5/25/2023
really helped!
Ttheogravity5/25/2023
This works, but it's not ideal as sql.lit says it's susceptible to SQL injection. Is there a version that I can use that would be safe (eg parameter binding)? The document.text_content variable would come from user input, and the contents of that variable is whatever they wrote. We may or may not have sanitized it by the time it gets called here.

export async function generateTextDocumentChunks(
  db: Kysely<SbDatabase>,
  document: TextDocumentRow,
): Promise<TextDocumentChunkRow[]> {
  return db
    .insertInto('text_document_chunk')
    .columns(['text_document_id', 'text'])
    .expression((eb) =>
      eb
        .selectFrom(
          eb
            .fn<{ sentence: string }>('unnest', [
              eb.fn('string_to_array', [sql.lit(document.text_content), sql.lit('. ')]),
            ])
            .as('sentence'),
        )
        .where('sentence', '~', '[[:alnum:]]')
        .select([sql.lit(document.id).as('text_document_id'), 'sentence']),
    )
    .returningAll()
    .execute();
}
Ttheogravity5/25/2023
Reading the doc, doing:

sql`${value}`


should be the equiv? This does work for me

export async function generateTextDocumentChunks(
  db: Kysely<SbDatabase>,
  document: TextDocumentRow,
): Promise<TextDocumentChunkRow[]> {
  return db
    .insertInto('text_document_chunk')
    .columns(['text_document_id', 'text'])
    .expression((eb) =>
      eb
        .selectFrom(
          eb
            .fn<{ sentence: string }>('unnest', [
              eb.fn('string_to_array', [sql`${document.text_content}`, sql.lit('. ')]),
            ])
            .as('sentence'),
        )
        .where('sentence', '~', '[[:alnum:]]')
        .select([sql`${document.id}`.as('text_document_id'), 'sentence']),
    )
    .returningAll()
    .execute();
}
IIgal5/25/2023
You should always validate and sanitize user input!
IIgal5/25/2023
You should try sql.val instead of interpolation
Ttheogravity5/25/2023
Agreed! Thanks! Here's the final result:

export async function generateTextDocumentChunks(
  db: Kysely<SbDatabase>,
  document: TextDocumentRow,
): Promise<TextDocumentChunkRow[]> {
  return db
    .insertInto('text_document_chunk')
    .columns(['text_document_id', 'text'])
    .expression((eb) =>
      eb
        .selectFrom(
          eb
            .fn<{ sentence: string }>('unnest', [
              eb.fn('string_to_array', [sql.val(document.text_content), sql.val('. ')]),
            ])
            .as('sentence'),
        )
        .where('sentence', '~', '[[:alnum:]]')
        .select([sql.val(document.id).as('text_document_id'), 'sentence']),
    )
    .returningAll()
    .execute();
}