Dog dataset examples: pugs are expected to be the hardest to identify, while huskies would be the easiest
Researchers from the University of Campinas in Brazil have decided to use computer vision to help people whose pets have gone missing. A moment of distraction is usually enough for unfortunate events like these, and automated visual recognition is highly-available and low-cost. Since it’s usually overlooked by owners, researchers have tried, in an article published on Oct. 9, 2015 on arxiv.org and updated in 2016 in Multimedia Tools and Applications, to make a better model to pursue face recognition for dogs, to reduce the number of false positives by strangers that have found a similar dog. They have contrasted three ready-to-use human facial recognizers to two original solutions based upon existing convolutional networks. Since human facial recognition did not achieve a great accuracy in canine images, this has shown that canine recognition is not a trivial task. Therefore, the researchers have focused on the neural-network approaches.
There are many ways for owners to retrieve their dogs successfully, but mandatory chips and GPS trackers are not available in most of the countries, so the main motivation of this work is: why not identify lost dogs using the dogs themselves, since the dog appearance is intrinsic. The datasets used were the Flickr-dog dataset and the Snoopybook dataset (university-made). For the Flickr dataset, the researchers have chosen images under Creative Commons licenses, cropped and aligned the images. They have chosen pugs and huskies to represent different degrees of challenges, since pugs are expected to be difficult to identify, while huskies are expected to be easy to recognize. They have labeled the individuals by interpreting the picture metadata (description, timestamp, title etc.), and they have cropped the pictures to reduce background information for unfair clues for the classifier. 374 photos were acquired, representing the two mentioned breeds, 21 individuals per breed, and at least 5 photos per individual. On the other hand, the Snoopybook dataset has 18 mongrel dogs, for a total of 251 photos, with at least 5 photos per individual. There is a less-controlled array of individuals, and this dataset is complementary to the well-controlled Flickr-dog dataset.
Human facial recognition models performed poorly, using eigenfaces, that focuses on reconstructability and emphasizes low-frequency information, with dog patterns clearly noticeable.
The convolutional neural networks seemed like the most promising solution for canine facial recognition. The first shallow architecture employed random weights, and the resulting features were fed to a linear support vector machine (a model for categorization and clustering). This approach was called BARK – Best Architecture to Retrieve K9. The second original proposal employed OverFeat, a feature extractor, and the results were fed to a linear SVM as well. The second proposal was called WOOF – Wields Of-the-shelf OverFeat Features.
Regarding convolutional neural network models, BARK and WOOF performed similarly on Flickr-dog, but WOOF performed better on Snoopybook. The results have shown that pet identification is a new frontier for computer vision. The results manifest that canine facial recognition in dogs is possible with accuracies much above pure chance, for example WOOF’s balanced average accuracy reached 66.9% for Flickr-dog and 89.48% for Snoopybook. This shows there is much space for improvement, but this is a first step in a huge challenge for computer vision and machine learning algorithms.
- Moreira et al. (2016): “Where is my puppy? Retrieving lost dogs by facial features”, in: Multimed Tools Appl, doi:10.1007/s11042-016-3824-1