ashai vectorize

There is an issue where the code keeps embedding vectors even though they were already embedded I am on postgres 16,
sudo -u postgres psql -d t_db -c "select count(*) from poems;"
count
-------
1437
(1 row)
sudo -u postgres psql -d t_db -c "select count(*) from poems;"
count
-------
1437
(1 row)
select count(id) from oban_jobs where queue = 'poem_vectorize';
count
-------
4314
(1 row)
select count(id) from oban_jobs where queue = 'poem_vectorize';
count
-------
4314
(1 row)
No description
10 Replies
ZachDaniel
ZachDaniel2mo ago
What does your trigger look like?
Abu kumathra
Abu kumathraOP2mo ago
vectorize do
attributes title: :vectorized_title,
meter: :vectorized_meter,
# rhyme: :vectorized_rhyme,
topics: :vectorized_topics

# strategy :manual
strategy :ash_oban
ash_oban_trigger_name :vectorize
embedding_model Taleed.OpenAIEmbeddingModel
end

oban do
triggers do
trigger :vectorize do
action :ash_ai_update_embeddings
worker_read_action :read
worker_module_name __MODULE__.AshOban.Worker.UpdateEmbeddings
scheduler_module_name __MODULE__.AshOban.Scheduler.UpdateEmbeddings
# list_tenants MyApp.ListTenants
end
end
end
vectorize do
attributes title: :vectorized_title,
meter: :vectorized_meter,
# rhyme: :vectorized_rhyme,
topics: :vectorized_topics

# strategy :manual
strategy :ash_oban
ash_oban_trigger_name :vectorize
embedding_model Taleed.OpenAIEmbeddingModel
end

oban do
triggers do
trigger :vectorize do
action :ash_ai_update_embeddings
worker_read_action :read
worker_module_name __MODULE__.AshOban.Worker.UpdateEmbeddings
scheduler_module_name __MODULE__.AshOban.Scheduler.UpdateEmbeddings
# list_tenants MyApp.ListTenants
end
end
end
this is the trigger as you can see all the jobs were done in the vectorize queue for the poems
ZachDaniel
ZachDaniel2mo ago
trigger :vectorize do
action :ash_ai_update_embeddings
worker_read_action :read
scheduler_cron nil # <- add this
worker_module_name __MODULE__.AshOban.Worker.UpdateEmbeddings
end
trigger :vectorize do
action :ash_ai_update_embeddings
worker_read_action :read
scheduler_cron nil # <- add this
worker_module_name __MODULE__.AshOban.Worker.UpdateEmbeddings
end
It runs it on a schedule by default which you don't want is it in our docs like that? If so plase open an issue or even bettr a PR
Abu kumathra
Abu kumathraOP2mo ago
yes actually it does have scheduler_cron nil, but I think the compiler was complaining about something
Abu kumathra
Abu kumathraOP2mo ago
it should be set to false instead
No description
Abu kumathra
Abu kumathraOP2mo ago
also there is no mention of the need to add the oban queue to the config.exs the idea of the scheduler is great but it can be improved by actually batching the get embedding call [this is already been done when there is different attributes in the same] + only touching non embedded rows this would be a good feature, I am willing to help but I will need links to guides. also I am new to the framework
ZachDaniel
ZachDaniel2mo ago
Yeah, we want to make ash_oban support bulk updates once we support like a batch_size and have it fetch n records and bulk update them then we can just use that in this action Can you PR the addition of those options to the docs?
Abu kumathra
Abu kumathraOP2mo ago
as far as I remember open ai has a max token limit on the embedding request so maybe a way to handle that too would be great will do
Abu kumathra
Abu kumathraOP2mo ago
GitHub
impr: Switching off the scheduler+Mention of the need to add the qu...
Contributor checklist Leave anything that you believe does not apply unchecked. I accept the AI Policy, or AI was not used in the creation of this PR. Bug fixes include regression tests Chores ...

Did you find this page helpful?