TY - GEN
T1 - Using deep learning for depth estimation and 3D reconstruction of humans
AU - Freller, Alexander
AU - Turk, Dora
AU - Zwettler, Gerald A.
N1 - Funding Information:
This research is initiated and guided by AMB Technology GmbH through project TrueSize S.M.B.S. 21650377 with basic funding program number 872105, sponsored by the Austrian Research Promotion Agency FFG, fed by research budget of the Federal Republic of Austria. Contact: Anna Maria Brunnhofer, CEO.
Publisher Copyright:
© 2020 The Authors.
PY - 2020
Y1 - 2020
N2 - Deep learning for depth estimation from monocular video feed is a common strategy to get rough 3D surface information when an RGB-D camera is not present. Depth information is of importance in many domains such as object localization, tracking, and scene reconstruction in robotics and industrial environments from multiple camera views. The convolutional neural networks UpProjection, DORN, and Encoder/Decoder are evaluated on hybrid training datasets enriched by CGI data. The highest accuracy results are derived from the UpProjection network with a relative deviation of 1.77% to 2.69% for CAD-120 and SMV dataset respectively. It is shown, that incorporation of front and side view allows to increase the achievable depth estimation for human body images. With the incorporation of a second view the error is reduced from 6.69% to 6.16%. For the target domain of this depth estimation, the 3D human body reconstruction from aligned images in T-pose, plain silhouette reconstruction generally leads to acceptable results. Nevertheless, additionally incorporating the rough depth approximation in the future, concave areas at the chest, breast, and buttocks, currently not handled by the silhouette reconstruction, can result in more realistic 3D body models by utilizing the deep learning outcome in a hybrid approach.
AB - Deep learning for depth estimation from monocular video feed is a common strategy to get rough 3D surface information when an RGB-D camera is not present. Depth information is of importance in many domains such as object localization, tracking, and scene reconstruction in robotics and industrial environments from multiple camera views. The convolutional neural networks UpProjection, DORN, and Encoder/Decoder are evaluated on hybrid training datasets enriched by CGI data. The highest accuracy results are derived from the UpProjection network with a relative deviation of 1.77% to 2.69% for CAD-120 and SMV dataset respectively. It is shown, that incorporation of front and side view allows to increase the achievable depth estimation for human body images. With the incorporation of a second view the error is reduced from 6.69% to 6.16%. For the target domain of this depth estimation, the 3D human body reconstruction from aligned images in T-pose, plain silhouette reconstruction generally leads to acceptable results. Nevertheless, additionally incorporating the rough depth approximation in the future, concave areas at the chest, breast, and buttocks, currently not handled by the silhouette reconstruction, can result in more realistic 3D body models by utilizing the deep learning outcome in a hybrid approach.
KW - Convolutional Neural Networks
KW - Deep Learning
KW - Depth Estimation
KW - Human Body 3D Reconstruction
UR - http://www.scopus.com/inward/record.url?scp=85097716967&partnerID=8YFLogxK
U2 - 10.46354/i3m.2020.emss.040
DO - 10.46354/i3m.2020.emss.040
M3 - Conference contribution
T3 - 32nd European Modeling and Simulation Symposium, EMSS 2020
SP - 281
EP - 287
BT - 32nd European Modeling and Simulation Symposium, EMSS 2020
A2 - Affenzeller, Michael
A2 - Bruzzone, Agostino G.
A2 - Longo, Francesco
A2 - Petrillo, Antonella
PB - DIME UNIVERSITY OF GENOA
T2 - 32nd European Modeling and Simulation Symposium, EMSS 2020
Y2 - 16 September 2020 through 18 September 2020
ER -