Who’s behind the camera: computer vision and author identification
Aug 24, 2015
Predicting authorship and detecting forgeries has been a long-time application of computer vision in art and science. Humans, as visual beings, have a remarkable ability to draw different conclusions from a piece of artwork, even without formal training, and to obtain information such as: where the artwork takes place, what kind of artist had made it, how it was made or what it depicts. Computer vision techniques have been used to detect forged paintings and copies, and to analyze different brush strokes and shades used. Numerical quantities that are usually used to differentiate a forge from an original work are complexity and self-organization, since copies tend to be more technical and exact, without errors or underlying corrections. Even though there is a substantial amount of progress made in the field of painting categorization, which matches authors with their works, judging by the aspects of style, perspectives and shades used, no attempts until recently have been made to recognize the authorship of photographs. That is surprising, since an average person in the modern world is exposed to more photographs than paintings and similar artwork daily.
The cues used for the classification of paintings don’t work easily in the field of photography, since stylistic differences are not that apparent. Namely, because there are no analogue techniques like brush strokes or smooth/hard brushes, based on the specifics of that particular medium. Therefore, identifying an author of a photograph is a much more difficult task than identifying an author of a painting, especially since we’re focusing on unedited images. Hence, there are few attributes that can be used for artist classification, and with similar subjects stylistic variations are extremely subtle. Even though previous research was primarily used for detecting forgeries, such as in the case of Van Gogh’s paintings, a more recent approach was to classify different artwork according to artistic styles and periods, such as baroque ones, impressionistic ones or cubist pieces.
Thomas and Kovashka have, a couple of days ago, published a study which aims to classify photographs according to their makers. Their photographer dataset consists of 25 well-known photographers and contains 119 806 images of varying resolutions. Photographs span from early times in the beginning of the 20th century up to modern-day images. Most of the images for the dataset were crawled from the web of the Library of Congress’ photo archives, and the remaining set was obtained from the National Library of Australia. The dataset information consists of the ID of the photographer who took it, a title of the photo, its summary and a subject if it’s known, along with the URL from which it had been added. A broad space of features has been used for identification and classification. Some of these features are low-level ones, which means that each dimension of the feature vector is a direct product of visual data in the image at a certain position, and some of them high-level ones, i.e. each dimension of a high-level feature vector has a semantic meaning, i.e. it often corresponds to the presence of a certain object in the image.
Low-level features include color histogram, i.e. whether a photographer uses only black and white photographic technique, color, or a combination, and GIST, a holistic representation of the visual field which estimates the openness and ruggedness of the scene. Intermediate-level feature is SURF (Speeded-Up Robust Features), which is a local-feature detector and descriptor, usually used for object recognition and classification. K-means clustering was used to train image descriptors, with
k = 500 to obtain a vocabulary of 500 visual words, and their normalized histograms are extracted for each test image, which therefore forms a 500-dimensional descriptor. High-level features include the object bank and deep convolutional networks. The object bank descriptor is created by running object detectors over an image using spatial pooling which encapsulates the location of the object detection in the description, so that we’re left with spatial relationships between different objects. Deep convolutional networks, where individual neurons are tiled so they respond to overlapping regions in the visual field, like Caffenet and Hybrid-CNN, have been tested, both with 60 million parameters and 50 000 neurons each.
Experimental evaluation included a random sampling of 20 images from each photographer to produce a test set of 460 images, and the remainder of the images is used as a training set. A multiclass support vector machine was trained for each of the features with linear kernels and class weight, in order that all photographers have equal weight during training, despite the varying number of training images for each author. The experiment had been repeated ten times with different samples, to acquire a total of 200 testing images per photographer.
The results have showed, for example, the top 10 most represented categories, and various photographers have been connected with the most common features. For example, Van Vechten’s photographs are almost exclusively portraits and therefore contain people, but SVM here was not as successful as it was with clothes, which can be seen as the most prominent category in the table for Van Vechten, like bow tie, suit, sweatshirt, trenchcoat or a cloak. Given different attributes that appeared as the most common ones and the most prominent ones for each photographer, we can tell a lot about their style, and subjects and themes they tend to depict.
The experiment has showed that high-level features perform significantly better overall than low-level ones, and some features like clothing are better than other features, especially prominent and generic ones like people. These experiments showed promising results, but higher performance will be necessary in order to apply these findings. The other improvements would include various lexical features to reduce false positives, automatization of determining an optimal integration of features through a certain feature-selection process, and to increase the number of images and descriptions in a dataset.
- Blessing, Alexander and Kai Wen (2010): “Using machine learning for identification of art paintings”, in: Technical Report, Stanford University.
- Karayev et al. (2013): Recognizing image style, arXiv.org
- Thomas, Christopher and Adriana Kovashka (2015): Who’s Behind the Camera? Identifying the Authorship of a Photograph, arxiv.org