Toppled Realities: Challenges in Generation and Validation of Synthetic Data
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
In advancing automation in infrastructure maintenance, collecting comprehensive datasets is arduous. While synthetic data provides a promising avenue to address real data shortages, problems remain in creating and validating the generations. This position paper aims to push the boundary in generating synthetic data without prior training samples and validating the synthetic generation by vision language models (VLM), as learned from our exploratory trials. Our exploratory trials attempted to generate new toppled road lights in road scene images with several inpainting and image editing tools, and ultimately resorted to a more deterministic approach of "create, prepare, stylise and inpaint". When validating the synthetic toppled road lights, we explored the possibility of automating prompt engineering and made four main observations. Whilst exploration and exploitation can be seen, responses were sensitive to the text prompts. The model struggled with the dilemma of adhering to the instruction without good results and self-hallucinating for good results by goal misspecification. From the exploratory trials, we posit that finding the right starting point is important for generating synthetic data that appears real. VLMs can be more widely adopted for detection and validation with more meticulous auto-prompt engineering.
Description
Keywords
Journal Title
Conference Name
Journal ISSN
Volume Title
Publisher
Publisher DOI
Publisher URL
Rights and licensing
Sponsorship
EPSRC (EP/V056441/1)
