The worker has a serious bug. I tested

The worker has a serious bug. I tested the output logs and found no issues with the data. However, the accuracy was lost after uploading.
24 Replies
garvitg
garvitg2mo ago
There is no difference in the upsert operation logic between worker bindings and REST API.
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
I output logs before uploading, and the data is still correct. However, after uploading, the accuracy is lost.
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
No description
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
No description
garvitg
garvitg2mo ago
And what do you see when you try to send a getById operation via the worker binding?
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
When uploading, it was 0.011006837710738182, but after uploading, it became 0.011006838
garvitg
garvitg2mo ago
What logs are you looking at?
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
No description
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
This is what I obtained through my ID I output to see what the data is before calling upsert, and then I query the data using getByIds. The two are inconsistent.
garvitg
garvitg2mo ago
I'll look into this during my business hours. The upsert logic is shared between worker bindings and the HTTP API, so I am wondering if this is some Typescript parse issue.
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
What you said may be that the ideal situation is sharing, just like how Python SDK cannot use upsert. The official GitHub says they are researching this issue. Ideally, it would be possible. But in reality, it's not feasible. I inserted 2 million * 512 vectors before discovering this bug.
garvitg
garvitg2mo ago
Can you link the github issue/comment?
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
GitHub is an issue with the Python SDK, and I have given up on Python and switched to using workers. I didn't expect there would still be a problem. I searched through chatgpt and found that the data change was that my data was float64 and was converted to float32 for processing. How can I debug the data sent by the worker internally? When can we come to a conclusion? Please check, I am currently storing a large amount of data for nothing. Wasted not r2 read times.
Jerome
Jerome2mo ago
vector float data in vectorize is stored on 32 bits (ie, f32 not f64)
yevgen
yevgen2mo ago
we will update docs to reflect that.
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
Got it, will we upgrade to 64 in the future? I also hope to have a method to obtain a list of all IDs,
yevgen
yevgen2mo ago
we will have a method to obtain a list of all vectors in the index. it is possible.
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
But I found that your query is different from upsert. My upsert loses accuracy, and I understand the reason now. However, when I searched for high accuracy in my query, its similarity was not 100%
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
No description
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
I inserted the same data into the index and searched again, and surprisingly, its similarity was 82%. This still feels like there is a bug
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
No description
yinxingmaiming6409
yinxingmaiming6409OP2mo ago
VectorFloatArray contains 32 and 64
Jerome
Jerome2mo ago
Cloudflare Docs
Query vectors
Querying an index, or vector search, enables you to search an index by providing an input vector and returning the nearest vectors based on the configured distance metric.
Jerome
Jerome2mo ago
yes! this is the typescript worker definition, which correctly accepts 32 and 64 bits floats as vectorize accepts them both; in the case of the 64 bit values they are downcasted to 32 bits as explained though.

Did you find this page helpful?