Abstract
Uncrewed aerial vehicles (UAVs) play an increasingly important role in ecological monitoring, enabling high-resolution mapping of wildlife, vegetation, and habitats across large and inaccessible areas. However, the development of robust computer vision models for such applications is hindered by the limited availability of diverse and well-annotated UAV datasets, as real-world data acquisition is expensive, logistically constrained, and often ecologically sensitive. In this work, we investigate the use of diffusion-based generative artificial intelligence models to synthesize realistic UAV imagery for ecological monitoring under extreme data scarcity. We fine-tune two state-of-the-art diffusion architectures—Stable Diffusion 3.5 and FLUX.1 Dev—using Low-Rank Adaptation (LoRA) on only 30 curated domain images to generate high-resolution aerial scenes in RGB as well as two thermal styles (inferno and white-hotspot). Generative performance is evaluated using Fréchet Inception Distance (FID), Kernel Inception Distance (KID), and CLIP score, providing complementary assessments of distributional similarity and semantic alignment. Our results demonstrate consistent convergence across all modalities and configurations, with Stable Diffusion 3.5 generally exhibiting faster and more stable adaptation. Despite the minimal training data, the fine-tuned models produce visually coherent and ecologically plausible aerial imagery, confirming the feasibility of parameter-efficient diffusion adaptation for multi-modal UAV data synthesis. These findings highlight the potential of generative diffusion models as a scalable alternative to conventional UAV data collection for ecological monitoring and wildlife analysis, enabling reproducible experimentation and targeted augmentation of rare ecological scenarios.
| Original language | English |
|---|---|
| Title of host publication | EUROCAST 2026 Computer Aided Systems Theory EXTENDED ABSTRACTS |
| Publication status | Published - Feb 2026 |