why does searching for "ice cream" search for both words in clip models

i tried a few of the clip models on hugging face and they all do the same thing

they seem to recognize ice cream, but also return pictures that contain just ice?

is just a fault of the clip models not recognizing ice cream properly?
Was this page helpful?