Toppled Realities: Challenges in Generation and Validation of Synthetic Data

In advancing automation in infrastructure maintenance, collecting comprehensive datasets is arduous. While synthetic data provides a promising avenue to address real data shortages, problems remain in creating and validating the generations. This position paper aims to push the boundary in generating synthetic data without prior training samples and validating the synthetic generation by vision language models (VLM), as learned from our exploratory trials. Our exploratory trials attempted to generate new toppled road lights in road scene images with several inpainting and image editing tools, and ultimately resorted to a more deterministic approach of "create, prepare, stylise and inpaint". When validating the synthetic toppled road lights, we explored the possibility of automating prompt engineering and made four main observations. Whilst exploration and exploitation can be seen, responses were sensitive to the text prompts. The model struggled with the dilemma of adhering to the instruction without good results and self-hallucinating for good results by goal misspecification. From the exploratory trials, we posit that finding the right starting point is important for generating synthetic data that appears real. VLMs can be more widely adopted for detection and validation with more meticulous auto-prompt engineering.

Conference Name

9th International Workshop on Annotation of Real World Data for Artificial Intelligent Systems (ARDUOUS 2025). Affiliated to European Conference on Artificial Intelligence (ECAI 2025)

Rights and licensing

Sponsorship

Engineering and Physical Sciences Research Council (EP/S02302X/1)
EPSRC (EP/V056441/1)

The UK Engineering and Physical Sciences Research Council (EPSRC) Centre for Doctoral Training in Future Infrastructure and Built Environment: Resilience in a Changing World (FIBE2) [grant number EP/S02302X/1], with industrial sponsorships from the National Highways, Costain and Trimble Solutions. The work is also supported by the Digital Roads, UK EPSRC [grant number EP/V056441/1].

Collections

University of Cambridge Research Outputs (Articles and Conferences)