For questions please email to nsantava@pme.duth.gr
Abstract
Vision based human pose estimation is an non-invasive technology for human-computer interaction (HCI). Direct use of the hand as an input device provides an attractive interaction method, with minimum need for specialized equipment, such as exoskeletons, gloves etc, but a camera and a processing platform. Various applications exploit algorithms which have the capability of estimating a hand's pose. Such applications include control of robotics systems, video games, computer-generated imagery (CGI) etc. In this letter, we present a novel Convolutional Neural Network architecture, reinforced with a Self-Attention module that it could be deployed on an embedded system, due to its lightweight nature, with just 1,9 Million parameters.
Method Overview
The presented architecture is based on the very successful
idea of DenseNets. In a DenseNet, each layer obtains
additional inputs from all preceding ones and propagates its
own feature-maps to all subsequent layers, by a channel-wise
concatenation.
We implement the inverted bottleneck block,
enhanced by an Attention Augmented Convolutional layer,
which output is added to the product of the Depthwise Separable Convolutional layer, as shown to the following figure.
Results
AUC | EPE (px) | ||
---|---|---|---|
Mean | Median | ||
MPII+NZSL Dataset | |||
Zimm. et al. (ICCV 2017) | 0.17 | 59.4 | - |
Bouk. et al. (CVPR 2019) |
0.50 | 18.95 | - |
Ours |
0.55 | 16.1 | 11 |
LSMV Dataset | |||
Gomez-Donoso et al. | - | 10 | - |
Li et al. | - | 8 | - |
Ours | 0.89 | 3.3 | 2.5 |
Stereo Hand Pose Dataset | |||
Zimm et al. (ICCV 2017) | 0.81 | 5 | 5.5 |
Ours | 0.92 | 2.2 | 1.8 |
FreiHand Dataset | |||
Ours | 0.87 | 4 | 3.1 |
Arch 1 | Arch 2 | Arch 3 | Arch 4 | Arch 5 | Arch 6 | Arch 7 | Arch 8 | Arch 9 | Arch 10 | Arch 11 | Arch 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Attention module | * | - | - | * | * | - | * | - | - | * | - | * |
Pooling Method | Blur | Blur | Average | Average | Blur | Average | Average | Blur | Max | Max | Max | Max |
Activation Function | Mish | Mish | Mish | Mish | ReLU | ReLU | ReLU | ReLU | Mish | Mish | ReLU | ReLU |
Examples
Citation
If you find this paper useful in your research, please consider citing:
@article{
author={N. {Santavas} and I. {Kansizoglou} and L. {Bampis} and E. {Karakasis} and A. {Gasteratos}},
journal={IEEE Sensors Journal},
title={Attention! A Lightweight 2D Hand Pose Estimation Approach},
year={2020},