Shadow of a doubt: human action recognition from silhouettes

Three logical horizontal level of human silhouette Three logical horizontal level of human silhouette

Silhouette is an eponym derived from the name of a French finance minister Étienne de Silhouette who was, in 1759, forced by France’s credit crisis during the Seven Years’ War, to impose sever economic demands upon the French people, particularly the wealth. Because of this austere economy, his name became synonymous with anything done cheaply, and people started doing silhouette profiles cut from black cards, since this was – before the advent of photography – the cheapest way to record a person’s appearance.

It is often difficult to recognize actions and perform object tracking from silhouette images, especially from video. Researchers from India have proposed a novel human action recognition technique as a combination of several micro action sequences performed by one or more body parts of the human. This model estimates the movements of different body parts for any given time segment to classify actions. One important problem is the presence of occlusions, clutter, interaction among multiple objects, and changing of illuminations. A huge problem is environmental complexity, due to environmental condition of the scene elements; and acquisition complexity, which depends on video acquisition, that varies with respect to view point, movement of the camera etc. In general. human actions are complex as well. Therefore, a proposed model has to handle all of these challenges.

Flow diagram of the general procedure

Flow diagram of the general procedure

Researchers have used the videos that contain human silhouettes only, without considering silhouette extraction techniques. To reduce the complexity, the proposed work considered videos that contained only one human object in each of the frame. The major components of a generic recognition system using human silhouettes consists of three broad substeps: foreground extraction, foreground classification, and feature extraction and action classification. The foreground is extracted by eliminating background of the video, which helps to reduce the searching area of the current frame. The foreground classification determines whether the foreground area contains a human or not (to avoid analyzing nonhuman objects). The analysis of the human body parts movement is performed for the consecutive frames to determine the human action in successive frames.

The main components of this work are consisted of the following steps. The first is the automatic localization of body, head, hands etc. and the average accuracy rate is about 97%. The second is the extraction of newly introduced low-dimensional spatio-temporal body-parts movement features for human action recognition. This framework uses a rule-based logic set to determine different human actions. The datasets used were the Weizmann Human Action Database and MuHAVi. The proposed technique has successfully localized and determined human actions with high accuracy except the action with head upside down, sitting and lying. The spatio-temporal body-parts movement technique does not require training and is independent of the camera view angle. Experimental results involving the mentioned publicly available datasets have shown that this technique outperforms the other in terms of the success rate. The future of this work relies on extending the framework to detect action in the sitting and lying conditions as well, making the detection process more extensive.