I’m super confused by how this dataset is organized. The first link doesn’t tell me much about it, how do I download Imagenet? The github is interesting though - there seems to be a labeled image of each subset of each animal. Are there enough images here to allow ImageNet to properly differentiate between species? Kaggle1000 mini is super interesting though, is that small dataset enough to recognize the images in it? Some of the pictures in a fish folder includes a person holding the fish. How is the dataset splitting the image into subsections?
I tried playing the Emoji Scavenger Hunt game and found it to be quite intuitive. I had no trouble when tasked with finding a book, I pointed my camera at a bookshelf and it recognized the books immediately. However, the model misclassified my laptop as a TV which is actually great because I didn’t have a TV. And the model was just completely unable to recognize my socks as socks, no matter how I positioned it (possibly because of the color of the sock?) The game’s model can sometimes struggle with subtle object distinctions, especially between items that share visual similarities and struggle with objects which may be positioned in varied states.
I found that with the MobileNet model, stock images, particularly those featuring single items with clean backgrounds, were recognized easily. But when I tested images with multiple objects such as a real estate listing photo of a bathroom, the model classified it as a medicine box due to the presence of a cabinet. And when I put in a drawing (which might not be fair to the model since it wasn’t trained on drawings), it was completely misclassified.
This suggests that the model might struggle with context and multiple objects in more complex scenes. Factors like object position and lighting also seemed to impact the accuracy. For instance, objects placed in unusual angles or partially obscured were more likely to be misclassified.