TY - JOUR
T1 - Beyond Color: Advanced RGB-D data augmentation for robust semantic segmentation in crop farming scenes
AU - Kitzler, Florian
AU - Bauer, Alexander
AU - Kruder-Motsch, Viktoria
N1 - Publisher Copyright:
© 2026
PY - 2026/3/1
Y1 - 2026/3/1
N2 - The emergence of smart farming in recent years has substantially increased the importance of artificial vision systems in crop production. Data augmentation is essential for developing robust semantic segmentation models when dealing with small datasets, such as in selective weed control. Due to advances in multi-modal data fusion, RGB-D image datasets contribute substantially to improve model performance. However, most data augmentation techniques primarily modify the color channels, often neglecting the depth channel. Addressing this gap, we introduce three methods for augmenting RGB-D images: RGB-D-Aug, Recompose3D, and Compose3D. We conducted experiments utilizing a multi-modal fusion network tailored for semantic segmentation of different plant species, namely ESANet. RGB-D-Aug introduces artificial depth sensor noise in addition to commonly used geometric transformations and color variations. Recompose3D and Compose3D generate augmented RGB-D images and corresponding ground-truth labels by composing background images and a set of foreground plant snippets. Recompose3D rearranges plants from a given training image, while Compose3D employs all plant snippets available in the training dataset. In our experiments designed to evaluate generalization performance, we tested our three methods and compared them not only to the augmentation technique used in ESANet, which consists of geometric transformations and color channel variations, but also to an extended version of the Copy-Paste method, an image composition technique originally introduced for RGB images. All three of our proposed methods outperformed the ESANet augmentation. The image composition methods, Copy-Paste, Recompose3D, and Compose3D, performed significantly better, with Compose3D achieving the highest generalization performance of all methods tested. In addition to improving model robustness, Compose3D allows the creation of realistic agronomic image scenes. Our research is an important step towards developing robust and generalizable models for different applications in arable farming.
AB - The emergence of smart farming in recent years has substantially increased the importance of artificial vision systems in crop production. Data augmentation is essential for developing robust semantic segmentation models when dealing with small datasets, such as in selective weed control. Due to advances in multi-modal data fusion, RGB-D image datasets contribute substantially to improve model performance. However, most data augmentation techniques primarily modify the color channels, often neglecting the depth channel. Addressing this gap, we introduce three methods for augmenting RGB-D images: RGB-D-Aug, Recompose3D, and Compose3D. We conducted experiments utilizing a multi-modal fusion network tailored for semantic segmentation of different plant species, namely ESANet. RGB-D-Aug introduces artificial depth sensor noise in addition to commonly used geometric transformations and color variations. Recompose3D and Compose3D generate augmented RGB-D images and corresponding ground-truth labels by composing background images and a set of foreground plant snippets. Recompose3D rearranges plants from a given training image, while Compose3D employs all plant snippets available in the training dataset. In our experiments designed to evaluate generalization performance, we tested our three methods and compared them not only to the augmentation technique used in ESANet, which consists of geometric transformations and color channel variations, but also to an extended version of the Copy-Paste method, an image composition technique originally introduced for RGB images. All three of our proposed methods outperformed the ESANet augmentation. The image composition methods, Copy-Paste, Recompose3D, and Compose3D, performed significantly better, with Compose3D achieving the highest generalization performance of all methods tested. In addition to improving model robustness, Compose3D allows the creation of realistic agronomic image scenes. Our research is an important step towards developing robust and generalizable models for different applications in arable farming.
KW - Computer vision
KW - Data augmentation
KW - Precision agriculture
KW - RGB-D semantic segmentation
KW - Smart farming
KW - Weed control
UR - https://www.scopus.com/pages/publications/105027861549
U2 - 10.1016/j.compag.2026.111432
DO - 10.1016/j.compag.2026.111432
M3 - Article
SN - 0168-1699
VL - 244
SP - 111432
JO - Computers and Electronics in Agriculture
JF - Computers and Electronics in Agriculture
M1 - 111432
ER -