Journal: Journal of Imaging
Authors: Ana Romero, Pedro Carvalho, Luís Corte-Real and Américo Pereira
Abstract: “The problem of gathering sufficiently representative data such as human actions, shapes, and facial expressions is costly and time-consuming and also crucial to train robust models. This lead to the creation of techniques such as Transfer Learning or Data Augmentation. However, they are often insufficient. To address this, we propose a semi-automated mechanism that allows the generation and edition of visual scenes with synthetic humans performing various actions, with features such as background modification and manual adjustments of the 3D avatars to allow users to create data with greater variability. We also propose an evaluation methodology for assessing the results obtained by our method which is two-folded: (i) usage of an action classifier on the output data resulting from the mechanism; (ii) generating masks of the avatars and the actors to compare them through segmentation. The avatars were robust to occlusion, and their actions were recognizable and accurate to the respective input actors. The results also showed that even though the action classifier concentrates on the pose and movement of the synthetic humans, it strongly depends on contextual information to precisely recognize the actions. Generating the avatars for complex activities also proved problematic for action recognition and a clean and precise formation of the masks.”