The worker has a serious bug. I tested
The worker has a serious bug. I tested the output logs and found no issues with the data. However, the accuracy was lost after uploading.
24 Replies
There is no difference in the upsert operation logic between worker bindings and REST API.
I output logs before uploading, and the data is still correct. However, after uploading, the accuracy is lost.


And what do you see when you try to send a getById operation via the worker binding?
When uploading, it was 0.011006837710738182, but after uploading, it became 0.011006838
What logs are you looking at?

This is what I obtained through my ID
I output to see what the data is before calling upsert, and then I query the data using getByIds. The two are inconsistent.
I'll look into this during my business hours. The upsert logic is shared between worker bindings and the HTTP API, so I am wondering if this is some Typescript parse issue.
What you said may be that the ideal situation is sharing, just like how Python SDK cannot use upsert. The official GitHub says they are researching this issue.
Ideally, it would be possible. But in reality, it's not feasible.
I inserted 2 million * 512 vectors before discovering this bug.
Can you link the github issue/comment?
GitHub is an issue with the Python SDK, and I have given up on Python and switched to using workers. I didn't expect there would still be a problem.
I searched through chatgpt and found that the data change was that my data was float64 and was converted to float32 for processing. How can I debug the data sent by the worker internally?
When can we come to a conclusion? Please check, I am currently storing a large amount of data for nothing. Wasted not r2 read times.
vector float data in vectorize is stored on 32 bits (ie, f32 not f64)
we will update docs to reflect that.
Got it, will we upgrade to 64 in the future? I also hope to have a method to obtain a list of all IDs,
we will have a method to obtain a list of all vectors in the index.
it is possible.
But I found that your query is different from upsert. My upsert loses accuracy, and I understand the reason now. However, when I searched for high accuracy in my query, its similarity was not 100%

I inserted the same data into the index and searched again, and surprisingly, its similarity was 82%. This still feels like there is a bug

VectorFloatArray contains 32 and 64
Try using high-precision scoring as explained here: https://developers.cloudflare.com/vectorize/best-practices/query-vectors/#control-over-scoring-precision-and-query-accuracy
Cloudflare Docs
Query vectors
Querying an index, or vector search, enables you to search an index by providing an input vector and returning the nearest vectors based on the configured distance metric.
yes! this is the typescript worker definition, which correctly accepts 32 and 64 bits floats as vectorize accepts them both; in the case of the 64 bit values they are downcasted to 32 bits as explained though.