Invalid Byte Encoding with Postgres JSONB columns

I've been running a fairly successful website with drizzle-orm but have recently come across a problem that I cannot for the life of me figure out. Any insight into a solution, or even how to debug this further would be helpful.

I have a pretty standard setup -
  • postgres16 [docker image]
  • Web frontend using nuxt. Data used for the backend API is provided by drizzle.
I have a few tables that store information in jsonb columns. This data contains strings (peoples names) that often have characters outside the standard en-us character set. For a long time (almost a year) this has worked seemingly well for me and I had not noticed any errors. Recently though, I've started to receive errors like this from postgres -

invalid byte sequence for encoding "UTF8": 0xf8
invalid byte sequence for encoding "UTF8": 0xe5 0x6b 0x61

What I found out from investigating this is that the data being sent to postgres is using an encoding like USC-2 . The byte sequence that is reported as error is the unicode representation of the extended character not the utf-8 sequence. I determined this by running db.execute(sql'${JSON.stringify(entries)}::bytea') and then dumping the resulting buffer with toString('hex')

What's really crazy about this is that I can only get it to happen on my production server after about 30 minutes of uptime. Visiting a page triggers an update of a row in the database - The update, with extended characters, works immediately after the service comes up, but will then consistently fail after some amount of time.

I started with the postgres.js driver and assumed that it was a bug in their implementation, so I switched to node-postgres and ran into the same issue.

Right now I have no real guesses on what is going wrong - all the code is pretty straight forward. Maybe something with server side rendering is changing the default encoding output of JSON.stringify?

Any help is appreciated!
Was this page helpful?